The $\beta$-model is a powerful tool for modeling network generation driven by degree heterogeneity. Its simple yet expressive nature particularly well-suits large and sparse networks, where many network models become infeasible due to computational challenge and observation scarcity. However, existing estimation algorithms for $\beta$-model do not scale up; and theoretical understandings remain limited to dense networks. This paper brings several significant improvements to the method and theory of $\beta$-model to address urgent needs of practical applications. Our contributions include: 1. method: we propose a new $\ell_2$ penalized MLE scheme; we design a novel fast algorithm that can comfortably handle sparse networks of millions of nodes, much faster and more memory-parsimonious than all existing algorithms; 2. theory: we present new error bounds on $\beta$-models under much weaker assumptions than best known results in literature; we also establish new lower-bounds and new asymptotic normality results; under proper parameter sparsity assumptions, we show the first local rate-optimality result in $\ell_2$ norm; distinct from existing literature, our results cover both small and large regularization scenarios and reveal their distinct asymptotic dependency structures; 3. application: we apply our method to large COVID-19 network data sets and discover meaningful results.
翻译:β-模型是一种受度异质性驱动的网络生成建模的强大工具。其简洁而富有表现力的特性特别适用于大型稀疏网络——在此类场景中,许多网络模型因计算挑战和观测稀疏性而难以实施。然而,现有β-模型估计算法无法扩展,且理论理解仍局限于稠密网络。本文从方法与理论两方面对β-模型进行了若干重要改进,以应对实际应用的迫切需求。我们的贡献包括:1.方法层面:提出一种新的ℓ2惩罚最大似然估计方案;设计了一种新颖的快速算法,可轻松处理包含数百万节点的稀疏网络,比所有现有算法更快且内存消耗更少;2.理论层面:在远弱于现有文献最优结果的前提下,给出了β-模型的新误差界;同时建立了新的下界与渐近正态性结果;在适当的参数稀疏性假设下,首次给出了ℓ2范数下的局部速率最优性结果;与现有文献不同,我们的结果同时覆盖小正则化与大正则化场景,并揭示了它们不同的渐近依赖结构;3.应用层面:将所提方法应用于大型COVID-19网络数据集,发现了有意义的结果。