Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions is critical in a number of applications, where the size of datasets have reached significant scales. This technical report presents an optimized parallel implementation of Leiden, a high quality community detection method, for shared memory multicore systems. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, our Leiden implementation, which we term as GVE-Leiden, outperforms the original Leiden, igraph Leiden, and NetworKit Leiden by 436x, 104x, and 8.2x respectively - achieving a processing rate of 403 M edges/s on a 3.8 B edge graph. Compared to GVE-Louvain, our parallel Louvain implementation, GVE-Leiden achieves a total elimination of disconnected communities, with only a 13% increase in runtime. In addition, GVE-Leiden improves performance at an average rate of 1.6x for every doubling of threads.
翻译:社区检测是识别网络中自然划分的问题。随着数据集规模达到显著量级,高效并行算法在众多应用中至关重要。本技术报告提出了一种优化的Leiden并行实现——一种高质量社区检测方法,适用于共享内存多核系统。在配备双路16核Intel Xeon Gold 6226R处理器的服务器上,我们的Leiden实现(称为GVE-Leiden)相较于原始Leiden、igraph Leiden和NetworKit Leiden分别实现了436倍、104倍和8.2倍的性能提升——对38亿条边的图处理速率达到4.03亿条边/秒。与我们的并行Louvain实现GVE-Louvain相比,GVE-Leiden完全消除了非连通社区,运行时仅增加13%。此外,GVE-Leiden每线程数翻倍时性能平均提升1.6倍。