Real networks often have severe degree heterogeneity, with the maximum, average, and minimum node degrees differing significantly. This paper examines the impact of degree heterogeneity on statistical limits of network data analysis. Introducing the heterogeneity distribution (HD) under a degree-corrected mixed-membership network model, we show that the optimal rate of mixed membership estimation is an explicit functional of the HD. This result confirms that severe degree heterogeneity may decelerate the error rate, even when the overall sparsity remains unchanged. To obtain a rate-optimal method, we modify an existing spectral algorithm, Mixed-SCORE, by adding a pre-PCA normalization step. This step normalizes the adjacency matrix by a diagonal matrix consisting of the $b$th power of node degrees, for some $b\in \mathbb{R}$. We discover that $b = 1/2$ is universally favorable. The resulting spectral algorithm is rate-optimal for networks with arbitrary degree heterogeneity. A technical component in our proofs is entry-wise eigenvector analysis of the normalized graph Laplacian.
翻译:现实网络通常具有严重的度异质性,其最大、平均与最小节点度之间存在显著差异。本文研究了度异质性对网络数据分析统计极限的影响。通过在度校正混合成员网络模型中引入异质性分布(HD),我们证明了混合成员估计的最优速率是HD的显式泛函。这一结果证实,即使在整体稀疏度保持不变的情况下,严重的度异质性仍可能降低误差率的收敛速度。为获得速率最优的方法,我们通过增加预PCA归一化步骤改进现有谱算法Mixed-SCORE。该步骤使用由节点度的$b$次幂($b\in \mathbb{R}$)构成的对角矩阵对邻接矩阵进行归一化。我们发现$b = 1/2$具有普适优越性。改进后的谱算法对任意度异质性网络均能达到速率最优性。我们证明中的关键技术环节是对归一化图拉普拉斯矩阵进行逐项特征向量分析。