The analysis of network data has gained considerable interest in recent years. This also includes the analysis of large, high-dimensional networks with hundreds and thousands of nodes. While exponential random graph models serve as workhorse for network data analyses, their applicability to very large networks is problematic via classical inference such as maximum likelihood or exact Bayesian estimation owing to scaling and instability issues. The latter trace from the fact that classical network statistics consider nodes as exchangeable, i.e., actors in the network are assumed to be homogeneous. This is often questionable. One way to circumvent the restrictive assumption is to include actor-specific random effects, which account for unobservable heterogeneity. However, this increases the number of unknowns considerably, thus making the model highly-parameterized. As a solution even for very large networks, we propose a scalable approach based on variational approximations, which not only leads to numerically stable estimation but is also applicable to high-dimensional directed as well as undirected networks. We furthermore demonstrate that including node-specific covariates can reduce node heterogeneity, which we facilitate through versatile prior formulations and a new measure that we call posterior explained variance. We illustrate our approach in three diverse examples, covering network data from the Italian Parliament, international arms trading, and Facebook; and conduct detailed simulation studies.
翻译:近年来,网络数据分析引起了广泛关注,其中包含对具有成百上千个节点的大规模高维网络的分析。尽管指数随机图模型是网络数据分析的主要工具,但由于其存在缩放和不稳定问题,通过经典推断方法(如最大似然估计或精确贝叶斯估计)来处理超大规模网络时存在困难。后者源于经典网络统计假设节点可交换,即网络中的行为者被视为同质化的,这一假设往往存疑。规避这一限制性假设的途径之一是纳入行为者特定随机效应,用于解释不可观测的异质性。然而,这会显著增加未知参数数量,使模型呈现高度参数化特征。针对即使超大规模网络也能解决的问题,我们提出了一种基于变分近似的可扩展方法,既能实现数值稳定的估计,又可应用于高维有向及无向网络。进一步地,我们证明纳入节点特定协变量可降低节点异质性,并通过灵活的先验公式和名为后验解释方差的新指标来实现这一目标。我们通过涵盖意大利议会、国际军火交易和Facebook三类网络数据的实例进行方法验证,并开展了详尽的仿真研究。