Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions is critical in a number of applications, where the size of datasets have reached significant scales. This technical report presents one of the most efficient implementations of the Leiden algorithm, a high quality community detection method. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, our Leiden implementation, which we term as GVE-Leiden, outperforms the original Leiden, igraph Leiden, NetworKit Leiden, and cuGraph Leiden (running on NVIDIA A100 GPU) by 436x, 104x, 8.2x, and 3.0x respectively - achieving a processing rate of 403M edges/s on a 3.8B edge graph. In addition, GVE-Leiden improves performance at an average rate of 1.6x for every doubling of threads.
翻译:社区检测是识别网络中自然划分的问题。在许多数据集规模已达到显著量级的应用中,用于识别此类划分的高效并行算法至关重要。本技术报告提出了Leiden算法(一种高质量的社区检测方法)的最高效实现之一。在配备双路16核Intel Xeon Gold 6226R处理器的服务器上,我们实现的Leiden算法(称为GVE-Leiden)性能分别达到原始Leiden、igraph Leiden、NetworKit Leiden及cuGraph Leiden(运行于NVIDIA A100 GPU)的436倍、104倍、8.2倍和3.0倍——在包含38亿条边的图上实现了每秒4.03亿条边的处理速率。此外,GVE-Leiden在线程数每翻一倍时平均可获得1.6倍的性能提升。