RidgeWalker：基于FPGA的完美流水线图随机游走加速器 (RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs)

Graph Random Walks (GRWs) offer efficient approximations of key graph properties and have been widely adopted in many applications. However, GRW workloads are notoriously difficult to accelerate due to their strong data dependencies, irregular memory access patterns, and imbalanced execution behavior. While recent work explores FPGA-based accelerators for GRWs, existing solutions fall far short of hardware potential due to inefficient pipelining and static scheduling. This paper presents RidgeWalker, a high-performance GRW accelerator designed for datacenter FPGAs. The key insight behind RidgeWalker is that the Markov property of GRWs allows decomposition into stateless, fine-grained tasks that can be executed out-of-order without compromising correctness. Building on this, RidgeWalker introduces an asynchronous pipeline architecture with a feedback-driven scheduler grounded in queuing theory, enabling perfect pipelining and adaptive load balancing. We prototype RidgeWalker on datacenter FPGAs and evaluated it across a range of GRW algorithms and real-world graph datasets. Experimental results demonstrate that RidgeWalker achieves an average speedup of 7.0x over state-of-the-art FPGA solutions and 8.1x over GPU solutions, with peak speedups of up to 71.0x and 22.9x, respectively. The source code is publicly available at https://github.com/Xtra-Computing/RidgeWalker.

翻译：图随机游走（Graph Random Walks, GRWs）能够高效近似图的关键性质，已被广泛应用于众多领域。然而，GRW计算任务因其强烈的数据依赖性、不规则的内存访问模式以及不均衡的执行行为而难以加速。尽管近期研究探索了基于FPGA的GRW加速器，但现有方案因流水线效率低下和静态调度策略而远未发挥硬件潜力。本文提出RidgeWalker，一种面向数据中心FPGA的高性能GRW加速器。RidgeWalker的核心洞见在于：GRW的马尔可夫性质允许将其分解为无状态的细粒度任务，这些任务可在不破坏正确性的前提下乱序执行。基于此，RidgeWalker设计了基于排队论的反馈驱动调度器与异步流水线架构，实现了完美流水化与自适应负载均衡。我们在数据中心FPGA上对RidgeWalker进行原型实现，并在多种GRW算法和真实图数据集上开展评估。实验结果表明，RidgeWalker相比最先进的FPGA解决方案平均加速7.0倍，相比GPU解决方案平均加速8.1倍，峰值加速比分别达到71.0倍和22.9倍。源代码已公开于https://github.com/Xtra-Computing/RidgeWalker。

相关内容

FPGA

关注 18

FPGA：ACM/SIGDA International Symposium on Field-Programmable Gate Arrays。 Explanation：ACM/SIGDA现场可编程门阵列国际研讨会。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/fpga/

图增强生成（GraphRAG）

专知会员服务

34+阅读 · 2025年1月4日

UCLA最新《图神经网络加速》综述，54页pdf阐述算法、系统和定制硬件

专知会员服务

22+阅读 · 2023年7月1日

面向多GPU的图神经网络训练加速

专知会员服务

24+阅读 · 2023年1月19日

南洋理工北大等首篇《GPU数据中心中深度学习工作负载调度》综述论文，35页pdf全面阐述DL训练与推理GPU调度技术进展

专知会员服务

45+阅读 · 2022年5月27日