Community detection is a classic problem in network science with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes into communities. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning modularity-maximum (optimal) partitions. We evaluate (1) the ratio of their output modularity to the maximum modularity for each input graph and (2) the maximum similarity between their output partition and any optimal partition of that graph. Our computational experiments involve eight existing heuristic algorithms which we compare against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 16.9% of the 80 graphs considered. Results on adjusted mutual information show considerable dissimilarity between the sub-optimal partitions and any optimal partitions of the graphs in our experiments. More importantly, our results show that near-optimal partitions tend to be disproportionally dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based algorithms for discovering communities: they rarely return an optimal partition or a partition resembling an optimal partition. Given this finding, developing an exact or approximate algorithm for modularity maximization is recommendable for a more methodologically sound usage of modularity in community detection.
翻译:社区检测是网络科学中的一个经典问题,在多个领域有广泛应用。最常用的方法是通过设计算法最大化网络节点在不同社区划分中的模块度。本研究利用来自广泛背景的80个真实网络和随机网络,探讨当前启发式模块度最大化算法在返回最优模块度划分方面的有效性。我们评估了:(1) 每个输入图下,算法输出模块度与最大模块度的比值;(2) 输出划分与任意最优划分的最大相似度。计算实验涉及八种现有启发式算法,并将其与全局最大化模块度的精确整数规划方法进行比较。在80个图中,基于模块度的启发式算法平均仅对16.9%的图返回最优划分。调整互信息结果显示,在我们的实验中,次优划分与任意最优划分存在显著差异。更重要的是,结果表明接近最优的划分往往与任意最优划分不成比例地不相似。综合来看,我们的分析揭示了常用基于模块度的社区发现算法的关键局限:它们很少返回最优划分或与最优划分相似的划分。基于此发现,建议开发精确或近似算法以更方法论合理地使用模块度进行社区检测。