Community detection is a fundamental problem in computational sciences with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning maximum-modularity (optimal) partitions. We evaluate (1) the ratio of the algorithms' output modularity to the maximum modularity for each input graph, and (2) the maximum similarity between their output partition and any optimal partition of that graph. We compare eight existing heuristic algorithms against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 19.4% of the 80 graphs considered. Additionally, results on adjusted mutual information reveal substantial dissimilarity between the sub-optimal partitions and any optimal partition of the networks in our experiments. More importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based heuristics for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition. If modularity is to be used for detecting communities, exact or approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.
翻译:社区检测是计算科学中的一个基础问题,在多个领域具有广泛的应用。最常用的方法是通过最大化网络节点不同划分的模块度来设计的算法。利用来自不同领域的80个真实网络和随机网络,我们研究了当前启发式模块度最大化算法在返回最大模块度(最优)划分方面的表现。我们评估了(1)每个输入图中算法输出模块度与最大模块度的比率,以及(2)其输出划分与该图任何最优划分之间的最大相似度。我们将八种现有的启发式算法与一种全局最大化模块度的精确整数规划方法进行了比较。平均而言,基于模块度的启发式算法仅能在所考虑的80个图中的19.4%中返回最优划分。此外,调整互信息的结果表明,在我们实验的网络中,次优划分与任何最优划分之间存在显著的不相似性。更重要的是,我们的结果表明,接近最优的划分往往与任何最优划分不成比例地不相似。综合来看,我们的分析指出了常用的基于模块度的社区检测启发式方法的一个关键局限性:它们很少产生最优划分或与最优划分相似的划分。如果要用模块度来检测社区,建议采用精确或近似优化算法,以便在模块度的适用性范围内更方法学上合理地使用模块度。