Community detection is the problem of identifying densely connected clusters of nodes within a network. The Louvain algorithm is a widely used method for this task, but it can produce communities that are internally disconnected. To address this, the Leiden algorithm was introduced. However, our analysis and empirical observations indicate that the Leiden algorithm still identifies disconnected communities, albeit to a lesser extent. To mitigate this issue, we propose two new parallel algorithms: GSP-Leiden and GSP-Louvain, based on the Leiden and Louvain algorithms, respectively. On a system with two 16-core Intel Xeon Gold 6226R processors, we demonstrate that GSP-Leiden/GSP-Louvain not only address this issue, but also outperform the original Leiden, igraph Leiden, and NetworKit Leiden by 190x/341x, 46x/83x, and 3.4x/6.1x respectively - achieving a processing rate of 195M/328M edges/s on a 3.8B edge graph. Furthermore, GSP-Leiden/GSP-Louvain improve performance at a rate of 1.6x/1.5x for every doubling of threads.
翻译:社区检测是识别网络中密集连接节点簇的问题。Louvain 算法是此任务中广泛使用的方法,但它可能产生内部不连通的社区。为解决此问题,引入了 Leiden 算法。然而,我们的分析和实证观察表明,Leiden 算法仍会识别出不连通社区,尽管程度较轻。为缓解此问题,我们提出了两种新的并行算法:GSP-Leiden 和 GSP-Louvain,分别基于 Leiden 和 Louvain 算法。在配备两个 16 核 Intel Xeon Gold 6226R 处理器的系统上,我们证明 GSP-Leiden/GSP-Louvain 不仅解决了此问题,而且性能分别较原始 Leiden、igraph Leiden 和 NetworKit Leiden 提升了 190 倍/341 倍、46 倍/83 倍和 3.4 倍/6.1 倍——在 38 亿条边的图上实现了 1.95 亿/3.28 亿条边/秒的处理速率。此外,GSP-Leiden/GSP-Louvain 每增加一倍线程数,性能提升速率达 1.6 倍/1.5 倍。