Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions is critical in a number of applications, where the size of datasets have reached significant scales. This technical report presents an optimized parallel implementation of Leiden, a high quality community detection method, for shared memory multicore systems. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, our Leiden, which we term as GVE-Leiden, outperforms the original Leiden, igraph Leiden, and NetworKit Leiden by 373x, 86x, and 7.2x respectively - achieving a processing rate of 352M edges/s on a 3.8B edge graph. Compared to GVE-Louvain, our optimized parallel Louvain implementation, GVE-Leiden achieves an 11x reduction in disconnected communities, with only a 36% increase in runtime. In addition, GVE-Leiden improves performance at an average rate of 1.6x for every doubling of threads.
翻译:社区检测是在网络中识别自然划分的问题。高效的并行算法对于识别此类划分在众多应用场景中至关重要,这些场景中数据集规模已达到显著量级。本技术报告针对共享内存多核系统,提出了一种优化的并行Leiden算法实现,该算法是一种高质量的社区检测方法。在一台配备双路16核Intel Xeon Gold 6226R处理器的服务器上,我们提出的Leiden算法(称为GVE-Leiden)相比原始Leiden、igraph Leiden和NetworKit Leiden分别实现了373倍、86倍和7.2倍的性能提升,在38亿条边的图中处理速率高达每秒3.52亿条边。与我们的优化并行Louvain算法GVE-Louvain相比,GVE-Leiden将不连通社区数量减少了11倍,而运行时间仅增加36%。此外,GVE-Leiden在线程数每翻倍一次时,性能平均提升1.6倍。