Community structures represent a crucial aspect of network analysis, and various methods have been developed to identify these communities. However, a common hurdle lies in determining the number of communities K, a parameter that often requires estimation in practice. Existing approaches for estimating K face two notable challenges: the weak community signal present in sparse networks and the imbalance in community sizes or edge densities that result in unequal per-community expected degree. We propose a spectral method based on a novel network operator whose spectral properties effectively overcome both challenges. This operator is a refined version of the non-backtracking operator, adapted from a "centered" adjacency matrix. Its leading eigenvalues are more concentrated than those of the adjacency matrix for sparse networks, while they also demonstrate enhanced signal under imbalance scenarios, a benefit attributed to the centering step. This is justified, either theoretically or numerically, under the null model K = 1, in both dense and ultra-sparse settings. A goodness-of-fit test based on the leading eigenvalue can be applied to determine the number of communities K.
翻译:社区结构是网络分析的关键方面,已有多种方法被开发用于识别这些社区。然而,一个普遍的障碍在于确定社区数量K,该参数在实践中通常需要估计。现有的K估计方法面临两个显著挑战:稀疏网络中存在的微弱社区信号,以及社区规模或边密度不平衡导致的各社区期望度不均等。我们提出了一种基于新型网络算子的谱方法,其谱特性有效克服了这两大挑战。该算子是经过改进的非回溯算子,由“中心化”邻接矩阵调整而来。对于稀疏网络,其主导特征值比邻接矩阵的特征值更集中;同时在非平衡场景下,由于中心化步骤的益处,这些特征值也表现出增强的信号强度。这一特性在零模型K=1下,无论是稠密还是超稀疏设定中,均已通过理论或数值验证得到证实。基于主导特征值的拟合优度检验可用于确定社区数量K。