Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions are critical in a number of applications. This report presents an optimized implementation of the Label Propagation Algorithm (LPA) for community detection, featuring an asynchronous LPA with a Pick-Less (PL) method every 4 iterations to handle community swaps, ideal for SIMT hardware like GPUs. It also introduces a novel per-vertex hashtable with hybrid quadratic-double probing for collision resolution. On an NVIDIA A100 GPU, our implementation, $\nu$-LPA, outperforms FLPA, NetworKit LPA, and GVE-LPA by 364x, 62x, and 2.6x, respectively, on a server with dual 16-core Intel Xeon Gold 6226R processors - processing 3.0B edges/s on a 2.2B edge graph - and achieves 4.7% higher modularity than FLPA, but 6.1% and 2.2% lower than NetworKit LPA and GVE-LPA.
翻译:社区发现是识别网络中自然划分的问题。用于识别此类划分的高效并行算法在许多应用中至关重要。本报告提出了一种针对社区发现的标签传播算法(LPA)的优化实现,其特点在于采用异步LPA,并每4次迭代结合一次Pick-Less(PL)方法以处理社区交换,非常适合GPU等SIMT硬件。该实现还引入了一种新颖的每顶点哈希表,采用混合二次-双重探测方法解决冲突。在配备双路16核Intel Xeon Gold 6226R处理器的服务器上,我们的实现ν-LPA在NVIDIA A100 GPU上的性能分别超越FLPA、NetworKit LPA和GVE-LPA达364倍、62倍和2.6倍——在包含22亿条边的图上处理速度达到30亿边/秒——并且其模块度比FLPA高出4.7%,但比NetworKit LPA和GVE-LPA分别低6.1%和2.2%。