Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.
翻译:实践研究中(如Facebook和Twitter)常面临大规模网络问题。为研究大规模网络节点间的交互作用,空间自回归(SAR)模型得到广泛应用。然而,在大规模网络中对SAR模型进行估计仍极具挑战性:一方面受政策限制或高额采集成本影响,独立研究者往往无法观测或收集全部网络信息;另一方面即便能获取完整网络,采用拟极大似然估计法(QMLE)估计SAR模型也会因高昂计算成本而难以实现。针对上述问题,本文提出基于QMLE的SAR模型子网络估计方法。通过适当采样方法,可构建节点规模大幅缩减的子网络,继而将采样子网络视为完整网络,采用标准QMLE进行估计。此举显著降低了信息采集与模型计算成本,提升了方法的实践可行性。理论证明,在适当正则条件下,基于子网络的QMLE具有一致性与渐近正态性。本文基于模拟网络与真实网络结构开展了大量仿真研究。