Community detection is a classic problem in network science with extensive applications in various fields. Among numerous approaches, the most common method is modularity maximization. Despite their design philosophy and wide adoption, heuristic modularity maximization algorithms rarely return an optimal partition or anything similar. We propose a specialized algorithm, Bayan, which returns partitions with a guarantee of either optimality or proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. We compare Bayan against 30 alternative community detection methods using structurally diverse synthetic and real networks. Our results demonstrate Bayan's distinctive accuracy and stability in retrieving ground-truth communities of standard benchmark graphs. Bayan is several times faster than open-source and commercial solvers for modularity maximization making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Overall, our assessments point to Bayan as a suitable choice for exact maximization of modularity in real networks with up to 3000 edges (in their largest connected component) and approximating maximum modularity in larger instances on ordinary computers. A Python implementation of the Bayan algorithm (the bayanpy library) is publicly available through the package installer for Python (pip).
翻译:社群检测是网络科学中的经典问题,在多个领域具有广泛应用。在众多方法中,最常用的是模块度最大化。尽管启发式模块度最大化算法在设计理念上具有合理性并被广泛采用,但它们极少能返回最优划分或与其接近的划分。我们提出了一种专用算法Bayan,该算法返回的划分能保证最优性,或与最优划分的接近程度。Bayan算法的核心是分支-剪枝方案,该方案将模块度最大化问题转化为整数规划形式并求解至最优,或在一定因子范围内近似求解。我们将Bayan与30种替代社群检测方法在结构多样的合成网络和真实网络上进行了比较。结果表明,Bayan在恢复标准基准图的真实社群方面具有显著的准确性和稳定性。Bayan比用于模块度最大化的开源及商业求解器快数倍,因而能够找到其他现有方法无法优化的实例的最优划分。总体而言,我们的评估表明,Bayan是适用于在最大连通分量边数不超过3000的真实网络上精确最大化模块度,以及在普通计算机上近似最大化更大规模实例模块度的理想选择。Bayan算法的Python实现(bayanpy库)可通过Python包安装器(pip)公开获取。