Community detection is the problem of identifying tightly connected clusters of nodes within a network. Efficient parallel algorithms for this play a crucial role in various applications, especially as datasets expand to significant sizes. The Label Propagation Algorithm (LPA) is commonly employed for this purpose due to its ease of parallelization, rapid execution, and scalability - however, it may yield internally disconnected communities. This technical report introduces GSL-LPA, derived from our parallelization of LPA, namely GVE-LPA. Our experiments on a system with two 16-core Intel Xeon Gold 6226R processors show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10, 300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph. Additionally, GSL-LPA scales at a rate of 1.6x for every doubling of threads.
翻译:社区检测旨在识别网络中紧密连接的节点簇。针对此问题的高效并行算法在众多应用中发挥着关键作用,尤其是在数据集规模显著增长的背景下。标签传播算法(LPA)因其易于并行化、执行速度快且可扩展性强而常被用于此目的,然而,它可能产生内部不连通的社区。本技术报告介绍了GSL-LPA,该算法源于我们对LPA的并行化实现,即GVE-LPA。我们在配备两颗16核Intel Xeon Gold 6226R处理器的系统上进行的实验表明,GSL-LPA不仅缓解了上述问题,其处理速度分别达到FLPA、igraph LPA和NetworKit LPA的55倍、10,300倍和5.8倍,在包含38亿条边的图上实现了每秒8.44亿条边的处理速率。此外,GSL-LPA在线程数每翻一倍时,性能提升比例达到1.6倍。