Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions are critical in a number of applications. This report presents an optimized implementation of the Label Propagation Algorithm (LPA) for community detection, featuring an asynchronous LPA with a Pick-Less (PL) method every 4 iterations to handle community swaps, ideal for SIMT hardware like GPUs. It also introduces a novel per-vertex hashtable with hybrid quadratic-double probing for collision resolution. On an NVIDIA A100 GPU, our implementation, $\nu$-LPA, outperforms FLPA (sequential), NetworKit LPA (multicore), Gunrock LPA (GPU), and cuGraph Louvain (GPU) by 364x, 62x, 2.6x, and 37x, respectively, while running FLPA and NetworKit LPA on a server with dual 16-core Intel Xeon Gold 6226R processors - processing 3.0B edges/s on a 2.2B edge graph - and achieves 4.7% higher modularity than FLPA, but 6.1% and 9.6% lower than NetworKit LPA and cuGraph Louvain.
翻译:社区发现是识别网络中自然划分的问题。用于识别此类划分的高效并行算法在许多应用中至关重要。本报告提出了一种用于社区发现的标签传播算法(LPA)的优化实现,其特点是采用异步LPA,并每隔4次迭代使用Pick-Less(PL)方法处理社区交换,非常适合GPU等SIMT硬件。它还引入了一种新颖的每顶点哈希表,采用混合二次-双重探测解决冲突。在NVIDIA A100 GPU上,我们的实现ν-LPA分别比FLPA(顺序)、NetworKit LPA(多核)、Gunrock LPA(GPU)和cuGraph Louvain(GPU)快364倍、62倍、2.6倍和37倍(FLPA和NetworKit LPA在配备双16核Intel Xeon Gold 6226R处理器的服务器上运行)——在包含22亿条边的图上处理速度达到30亿边/秒——并且比FLPA的模块度高4.7%,但比NetworKit LPA和cuGraph Louvain分别低6.1%和9.6%。