Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.
翻译:动态图随机游走(DGRW)作为一种捕捉图中结构关系的实用工具而出现。在GPU上高效执行DGRW面临若干挑战。首先,现有采样方法需要预处理缓冲区,导致显著的空间复杂度。此外,图顶点度的幂律分布引入了负载不均衡问题,使得DGRW难以并行化。本文提出FlowWalker,一种基于GPU的动态图随机游走框架。FlowWalker实现了一种高效的并行采样方法,以充分利用GPU并行性并降低空间复杂度。同时,它采用以采样器为中心的范式,结合动态调度策略来处理海量游走查询。FlowWalker是一种无需在GPU全局内存中维护辅助数据结构的内存高效框架。我们在十个数据集上全面评估了FlowWalker的性能,实验结果表明,与现有的CPU、GPU和FPGA随机游走框架相比,FlowWalker分别实现了最高752.2倍、72.1倍和16.4倍的加速。案例研究表明,在字节跳动好友推荐GNN训练的流水线中,FlowWalker将随机游走时间从35%降低至3%。