As one of the most fundamental problems in graph processing, the Single-Source Shortest Path (SSSP) problem plays a critical role in numerous application scenarios. However, existing GPU-based solutions remain inefficient, as they typically rely on a single, fixed queue design that incurs severe synchronization overhead, high memory latency, and poor adaptivity to diverse inputs. To address these inefficiencies, we propose MultiLevelMultiQueue (MLMQ), a novel data structure that distributes multiple queues across the GPU's multi-level parallelism and memory hierarchy. To realize MLMQ, we introduce a cache-like collaboration mechanism for efficient inter-queue coordination, and develop a modular queue design based on unified Read and Write primitives. Within this framework, we expand the optimization space by designing a set of GPU-friendly queues, composing them across multiple levels, and further providing an input-adaptive MLMQ configuration scheme. Our MLMQ design achieves average speedups of 1.87x to 17.13x over state-of-the-art implementations. Our code is open-sourced at https://github.com/Leo9660/MLMQ.git.
翻译:作为图处理中最基本的问题之一,单源最短路径(SSSP)问题在众多应用场景中扮演着关键角色。然而,现有的基于GPU的解决方案效率仍然低下,因为它们通常依赖于单一、固定的队列设计,导致严重的同步开销、高内存延迟以及对多样化输入的适应性差。为解决这些低效问题,我们提出了多级多队列(MLMQ),这是一种新颖的数据结构,它将多个队列分布在GPU的多级并行性和内存层次结构中。为实现MLMQ,我们引入了一种类似缓存的协作机制以实现高效的队列间协调,并开发了一种基于统一读写原语的模块化队列设计。在此框架内,我们通过设计一组GPU友好的队列、在多个级别上组合它们,并进一步提供一种输入自适应的MLMQ配置方案,从而扩展了优化空间。我们的MLMQ设计相比最先进的实现,平均加速比达到1.87倍至17.13倍。我们的代码已在https://github.com/Leo9660/MLMQ.git开源。