Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity

Community detection is a classic network problem with extensive applications in various fields. Its most common method is using modularity maximization heuristics which rarely return an optimal partition or anything similar. Partitions with globally optimal modularity are difficult to compute, and therefore have been underexplored. Using structurally diverse networks, we compare 30 community detection methods including our proposed algorithm that offers optimality and approximation guarantees: the Bayan algorithm. Unlike existing methods, Bayan globally maximizes modularity or approximates it within a factor. Our results show the distinctive accuracy and stability of maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well performing algorithms, among which Bayan stands out as the most reliable method for small networks. A Python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for Python.

翻译：社区检测是网络分析中的经典问题，在各领域具有广泛应用。其最常用的方法是通过模块性最大化启发式算法，但这些方法很少返回最优划分或近似最优解。具有全局最优模块性的划分难以计算，因此相关研究尚不充分。本文使用结构多样化的网络，比较了30种社区检测方法，其中包括我们提出的具有最优性与近似保证的算法：巴彦算法。与现有方法不同，巴彦算法能实现模块性的全局最大化或在给定因子内逼近最优值。在两个标准基准测试中，我们的结果表明：在广泛的参数设置下，最大模块性划分在恢复预设划分方面具有独特的准确性和稳定性，其成功率高于大多数替代方法。与其余29种算法产生的划分相比，最大模块性划分在描述长度、覆盖度、性能、平均传导率和聚类优度等指标上均取得最佳中位值。这些优势以额外计算量为代价，而巴彦算法使得在小型网络（最大连通分量不超过3000条边）中进行此类计算成为可能。巴彦算法比使用开源及商业求解器进行模块性最大化的速度快数倍，能够为其他现有方法无法优化的实例找到最优划分。我们的研究指出了若干性能良好的算法，其中巴彦算法在小型网络分析中脱颖而出，成为最可靠的方法。巴彦算法的Python实现（bayanpy）已通过Python包安装器公开发布。