The single-source shortest path (SSSP) problem is a well-studied problem that is used in many applications. In the parallel setting, a work-efficient algorithm that additionally attains $o(n)$ parallel depth has been elusive. Alternatively, various approaches have been developed that take advantage of specific properties of a particular class of graphs. On a graphics processing unit (GPU), the current state-of-the-art SSSP algorithms are implementations of the Delta-stepping algorithm, which does not perform well for graphs with large diameters. The main contribution of this work is to provide an algorithm designed for GPUs that runs efficiently for such graphs. We present the parallel bucket heap, a parallel cache-efficient data structure adapted for modern GPU architectures that supports standard priority queue operations, as well as bulk update. We analyze the structure in several well-known computational models and show that it provides both optimal parallelism and is cache-efficient. We implement the parallel bucket heap and use it in a parallel variant of Dijkstra's algorithm to solve the SSSP problem. Experimental results indicate that, for sufficiently large, dense graphs with high diameter, we outperform the current state-of-the-art SSSP implementations on an NVIDIA RTX 2080 Ti and Quadro M4000 by up to a factor of 2.8 and 5.4, respectively.
翻译:单源最短路径(SSSP)问题是众多应用场景中广泛研究的经典问题。在并行计算领域,目前尚缺乏既能保证工作高效性又具备$o(n)$并行深度的算法。为此,研究者开发了多种利用特定图类性质的求解方案。在图形处理器(GPU)上,当前最先进的SSSP算法采用Delta-stepping算法实现,但该算法在处理大直径图时性能不佳。本文的主要贡献在于提出一种专为GPU设计、能高效处理此类图的算法。我们提出了并行桶堆(parallel bucket heap)——一种适配现代GPU架构的并行缓存高效数据结构,它支持标准优先级队列操作及批量更新。我们在多个经典计算模型中对这一结构进行了分析,证明其兼具最优并行性与缓存高效性。我们实现了该并行桶堆,并将其应用于并行迪杰斯特拉(Dijkstra)算法中以求解SSSP问题。实验结果表明,对于足够大、高直径的稠密图,我们的方法在NVIDIA RTX 2080 Ti和Quadro M4000上分别比当前最先进的SSSP实现快2.8倍和5.4倍。