Browsing by Author "Pipiras, Vladas"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- Learning attribute and homophily measures through random walksPublication . Antunes, Nelson; Banerjee, Sayan; Bhamidi, Shankar; Pipiras, VladasWe investigate the statistical learning of nodal attribute functionals in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied (model class P), where new nodes form connections dependent on both their attribute values and popularity as measured by degree. An associated model class U is described, which is amenable to theoretical analysis and gives access to asymptotics of a host of functionals of interest. Settings where asymptotics for model class U transfer over to model class P through the phenomenon of resolvability are analyzed. For the statistical learning, we consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec, 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. Estimators for learning the attribute distribution, degree distribution for an attribute type and homophily measures are proposed. The performance of such statistical learning framework is studied on both synthetic networks (model class P) and real world systems, and its dependence on the network topology, degree of homophily or absence thereof, (un)balanced attributes, is assessed.
- Probabilistic sampling of finite renewal processesPublication . Antunes, Nelson; Pipiras, VladasConsider a finite renewal process in the sense that interrenewal times are positive i.i.d. variables and the total number of renewals is a random variable, independent of interrenewal times. A finite point process can be obtained by probabilistic sampling of the finite renewal process, where each renewal is sampled with a fixed probability and independently of other renewals. The problem addressed in this work concerns statistical inference of the original distributions of the total number of renewals and interrenewal times from a sample of i.i.d. finite point processes obtained by sampling finite renewal processes. This problem is motivated by traffic measurements in the Internet in order to characterize flows of packets (which can be seen as finite renewal processes) and where the use of packet sampling is becoming prevalent due to increasing link speeds and limited storage and processing capacities.
- Regularisation and central limit theorems for an inverse problem in network sampling applicationsPublication . Antunes, Nelson; Jacinto, Gonçalo; Pipiras, VladasAn inverse problem motivated by packet sampling in communication networks and edge sampling in directed complex networks is studied through the operator perspective. The problem is shown to be ill-posed, with the resulting naive estimator potentially having very heavy tails, satisfying non-Gaussian central limit theorem and showing poor statistical performance. Regularisation of the problem leads to the Gaussian central limit theorem and superior performance of the regularised estimator, as a result of desirable properties of underlying operators. The limiting variance and convergence rates of the regularised estimator are also investigated. The results are illustrated on synthetic and real data from communication and complex networks.
- Regularized inversion of flow size distributionPublication . Antunes, Nelson; Pipiras, Vladas; Jacinto, GoncaloIn this paper, we revisit the estimation of the size distribution of packet flows in Internet traffic through an inversion approach for several packet sampling schemes which are based on probabilistic sampling (PS). We first study the statistical properties of the previously introduced inversion estimator in its general form and make connections to the singular value decomposition. This motivates the use of a regularization technique in the estimation of the flow size distribution. More specifically, a penalized weighted least square approach is proposed. We compare theoretically the penalized estimator under simplified assumptions against the (non-penalized) inversion approach in order to explain differences in their statistical behaviors. A data study with two real traces shows that the proposed penalized estimator outperforms the inversion estimator for all sampling schemes, corroborating the theoretical analysis. This work reveals that the simplest sampling schemes based on PS, that do not work with small sampling probabilities under the inversion approach, can be used with the penalized approach. Furthermore, the penalized approach allows considering smaller packet sampling rates for all the other sampling schemes.
- Sampling Based Estimation of In-Degree Distribution for Directed Complex NetworksPublication . Antunes, Nelson; Bhamidi, Shankar; Guo, Tianjian; Pipiras, Vladas; Wang, BangThe focus of this work is on estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the in-degree of that node is not observed. The suggested estimation of the in-degree distribution is based on two approaches. The inversion approach exploits the relation between the original and sample in-degree distributions, and can estimate the bulk of the in-degree distribution, but not the tail of the distribution. The tail of the in-degree distribution is estimated through an asymptotic approach, which itself has two versions: one assuming a power-law tail and the other for a tail of general form. The two estimation approaches are examined on synthetic and real networks, with good performance results, especially striking for the asymptotic approach. Supplementary files for this article are available online.
- Sampling methods and estimation of triangle count distributions in large networksPublication . Antunes, Nelson; Guo, Tianjian; Pipiras, VladasThis paper investigates the distributions of triangle counts per vertex and edge, as a means for network description, analysis, model building, and other tasks. The main interest is in estimating these distributions through sampling, especially for large networks. A novel sampling method tailored for the estimation analysis is proposed, with three sampling designs motivated by several network access scenarios. An estimation method based on inversion and an asymptotic method are developed to recover the entire distribution. A single method to estimate the distribution using multiple samples is also considered. Algorithms are presented to sample the network under the various access scenarios. Finally, the estimation methods on synthetic and real-world networks are evaluated in a data study.
- Skampling for the flow duration distributionPublication . Antunes, Nelson; Pipiras, Vladas; Veitch, Darryl; Bolla, R.; Ciucu, F.This paper concerns the problem of estimating the Internet flow duration distribution from indirect measurements due to network constraints. The aim is to estimate the distribution from observing: the possible superpositions (collisions) of sampled flow durations, the flow arrivals-to-departures times without identification of sampled flows and the number of sampled flows in progress. For each type of data available, we present estimators of the flow duration distribution, formulating the problem in queueing system terms. We also propose data streaming algorithms using sampling and sketching (through counters) to obtain the considered partial information from flows. At the core of this skampling (i.e. sampling and sketching) approach is the ability to tune the flow sampling probability for "optimal" flow load onto sketch entries (queues). Finally, we present numerical results comparing the different estimators of the flow duration distribution using two real Internet traces.
- Small and large scale behavior of moments of poisson cluster processesPublication . Antunes, Nelson; Pipiras, Vladas; Abry, Patrice; Veitch, DarrylPoisson cluster processes are special point processes that find use in modeling Internet traffic, neural spike trains, computer failure times and other real-life phenomena. The focus of this work is on the various moments and cumulants of Poisson cluster processes, and specifically on their behavior at small and large scales. Under suitable assumptions motivated by the multiscale behavior of Internet traffic, it is shown that all these various quantities satisfy scale free (scaling) relations at both small and large scales. Only some of these relations turn out to carry information about salient model parameters of interest, and consequently can be used in the inference of the scaling behavior of Poisson cluster processes. At large scales, the derived results complement those available in the literature on the distributional convergence of normalized Poisson cluster processes, and also bring forward a more practical interpretation of the so-called slow and fast growth regimes. Finally, the results are applied to a real data trace from Internet traffic.