This paper proposes efficient solutions for $k$-core decomposition with high parallelism. The problem of $k$-core decomposition is fundamental in graph analysis and has applications across various domains. However, existing algorithms face significant challenges in achieving work-efficiency in theory and/or high parallelism in practice, and suffer from various performance bottlenecks. We present a simple, work-efficient parallel framework for $k$-core decomposition that is easy to implement and adaptable to various strategies for improving work-efficiency. We introduce two techniques to enhance parallelism: a sampling scheme to reduce contention on high-degree vertices, and vertical granularity control (VGC) to mitigate scheduling overhead for low-degree vertices. Furthermore, we design a hierarchical bucket structure to optimize performance for graphs with high coreness values. We evaluate our algorithm on a diverse set of real-world and synthetic graphs. Compared to state-of-the-art parallel algorithms, including ParK, PKC, and Julienne, our approach demonstrates superior performance on 23 out of 25 graphs when tested on a 96-core machine. Our algorithm shows speedups of up to 315$\times$ over ParK, 33.4$\times$ over PKC, and 52.5$\times$ over Julienne.
翻译:本文针对$k$-核分解问题提出了高效的并行解决方案。$k$-核分解是图分析中的基础问题,在多个领域具有广泛应用。然而,现有算法在理论上难以实现工作高效性,在实践中难以获得高并行度,且存在多种性能瓶颈。我们提出了一种简单、工作高效的并行框架,易于实现,并能适应多种提升工作效率的策略。我们引入了两种技术来增强并行性:通过采样方案降低对高度数顶点的争用,以及通过垂直粒度控制(VGC)来减少低度数顶点的调度开销。此外,我们设计了一种分层桶结构,以优化具有高核值图的性能。我们在多种真实世界和合成图上评估了我们的算法。与包括ParK、PKC和Julienne在内的最先进并行算法相比,在96核机器上测试的25个图中,我们的方法在23个图上表现出更优的性能。我们的算法相比ParK实现了最高315倍的加速比,相比PKC实现了33.4倍加速比,相比Julienne实现了52.5倍加速比。