Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions is critical in a number of applications, where the size of datasets have reached significant scales. This technical report presents one of the most efficient parallel implementation of the Leiden algorithm, a high quality community detection method. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, our Leiden implementation, which we term as GVE-Leiden, outperforms the original Leiden, igraph Leiden, and NetworKit Leiden by 436x, 104x, and 8.2x respectively - achieving a processing rate of 403M edges/s on a 3.8B edge graph. Compared to GVE-Louvain, our parallel Louvain implementation, GVE-Leiden achieves a total elimination of disconnected communities, with only a 13% increase in runtime. In addition, GVE-Leiden improves performance at an average rate of 1.6x for every doubling of threads.
翻译:社区检测是识别网络中自然划分的问题。高效并行算法对于识别此类划分在众多数据集规模已达显著尺度的应用中至关重要。本技术报告提出了莱顿算法(一种高质量社区检测方法)最高效的并行实现之一。在配备双路16核Intel Xeon Gold 6226R处理器的服务器上,我们的莱顿实现(命名为GVE-Leiden)性能分别超越原始Leiden、igraph Leiden和NetworKit Leiden达436倍、104倍和8.2倍——在38亿条边的图上实现了4.03亿条边/秒的处理速率。与GVE-Louvain(我们的并行Louvain实现)相比,GVE-Leiden完全消除了不连通社区,仅增加13%的运行时间。此外,GVE-Leiden在每次线程数加倍时性能平均提升1.6倍。