Community detection is a fundamental problem in computational sciences with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning maximum-modularity (optimal) partitions. We evaluate (1) the ratio of the algorithms' output modularity to the maximum modularity for each input graph, and (2) the maximum similarity between their output partition and any optimal partition of that graph. We compare eight existing heuristic algorithms against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 16.9% of the 80 graphs considered. Additionally, results on adjusted mutual information reveal substantial dissimilarity between the sub-optimal partitions and any optimal partition of the networks in our experiments. More importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based heuristics for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition. If modularity is to be used for detecting communities, exact or approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.
翻译:社区检测是计算科学中的一个基本问题,在多个领域有广泛应用。最常用的方法是设计用于在网络节点不同划分上最大化模块度的算法。通过使用来自广泛背景的80个真实和随机网络,我们研究了当前启发式模块度最大化算法在返回最大模块度(最优)划分方面的成功程度。我们评估了:(1)算法输出模块度与每个输入图的最大模块度的比率,以及(2)其输出划分与该图任何最优划分之间的最大相似度。我们将八种现有启发式算法与一种全局最大化模块度的精确整数规划方法进行了比较。平均而言,基于模块度的启发式算法仅对80个图中的16.9%返回了最优划分。此外,调整互信息的结果表明,在我们的实验中,次优划分与网络的任何最优划分之间存在显著差异。更重要的是,我们的结果表明,接近最优的划分通常与任何最优划分不成比例地不相似。综上所述,我们的分析指出了常用的基于模块度的社区检测启发式方法的一个关键局限性:它们很少产生最优划分或类似于最优划分的划分。如果要使用模块度进行社区检测,建议采用精确或近似优化算法,以便在模块度的适用性限制内更合理地使用模块度。