This article develops limit laws for network sampling based estimates of subgraph counts and clustering coefficient of a large population network, and uses them for predictive inference. A model based approach is used, where the population network is assumed to be generated from a sparse Stochastic Block Model (SBM). To quantify the effects of node sampling under resource constraints, a sparse Bernoulli node sampling scheme is introduced, where the node selection probability decays to zero as the population size increases. Both induced and ego-centric network formation approaches are explored. Quantitative bounds on the speed of normal approximation for estimated subgraph counts are obtained in a joint model and design based asymptotic framework. These bounds show that inference accuracy depends on model sparsity, sampling sparsity, and features like edge density and minimum vertex cover size of the target subgraph. We find that the ego-centric approach can handle higher sparsity levels in both the model and sampling scheme, compared to the induced approach. We also show that if model sparsity remains below a threshold, inference quality is unaffected; beyond it, the quality degrades rapidly. The sufficient conditions for obtaining a Gaussian limit law also turn out to be necessary. For strictly balanced target subgraphs, we obtain sharp transitions from Gaussian to Poisson based limit laws, as sparsity levels increase. A complete description of limit laws for estimated subgraph counts is given for the induced case, with a near-complete one for the ego-centric case. These results also yield Gaussian and Poisson limit laws for the estimated clustering coefficient. Simulations support the theory across sparsity levels, and the proposed methodology is applied to a real data set.
翻译:暂无翻译