Community detection is a fundamental task in data analysis, and block models provide an approach for identifying a wide variety of community structures while offering high interpretability. The degree-corrected block model (DCBM) is an established model that accounts for the heterogeneity of node degrees. However, inference methods are computationally costly and highly sensitive to initialization, while cheaper alternatives, such as spectral or modularity-based approaches, are restricted to detecting specific structures, typically assortative. In this work, we show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization problem. Leveraging this insight, we propose a novel method for community detection and a theoretically well-grounded initialization strategy that provides an initial estimate of communities for inference algorithms. Our approach is agnostic to any specific network structure and applies to graphs with any structure representable by a DCBM. Experiments on synthetic and real benchmark networks show that our method detects communities comparable to those found by DCBM inference while being faster; for instance, it processes a graph with 100,000 nodes and 1,000,000 edges in approximately 4 minutes. Moreover, the proposed initialization strategy significantly improves solution quality and reduces the number of iterations required by all tested inference algorithms. Overall, this work provides a scalable and robust framework for community detection and highlights the benefits of a matrix-factorization perspective for the DCBM.
翻译:社区检测是数据分析中的基本任务,分块模型为识别多种社区结构提供了方法,同时具有高度可解释性。度修正分块模型(DCBM)是考虑节点度异质性的标准模型。然而,其推断方法计算成本高且对初始化极为敏感,而谱方法或基于模块度的方法等计算成本较低的替代方案局限于检测特定结构(通常是同配结构)。在本文中,我们证明DCBM推断可重新表述为约束非负矩阵分解问题。基于这一见解,我们提出了一种新颖的社区检测方法及一种理论依据充分的初始化策略,可为推断算法提供社区初始估计。该方法不依赖任何特定网络结构,适用于任何可由DCBM表示的图结构。在合成网络和真实基准网络上的实验表明,我们的方法在检测社区方面与DCBM推断结果相当,且速度更快——例如,处理含10万个节点和100万条边的图仅需约4分钟。此外,所提出的初始化策略显著提高了解的质量,并减少了所有测试推断算法所需的迭代次数。总体而言,本文为社区检测提供了一个可扩展且稳健的框架,并凸显了从矩阵分解角度理解DCBM的优势。