FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.

翻译：动态图随机游走（DGRW）作为一种捕捉图中结构关系的实用工具而出现。在GPU上高效执行DGRW面临若干挑战。首先，现有采样方法需要预处理缓冲区，导致显著的空间复杂度。此外，图顶点度的幂律分布引入了负载不均衡问题，使得DGRW难以并行化。本文提出FlowWalker，一种基于GPU的动态图随机游走框架。FlowWalker实现了一种高效的并行采样方法，以充分利用GPU并行性并降低空间复杂度。同时，它采用以采样器为中心的范式，结合动态调度策略来处理海量游走查询。FlowWalker是一种无需在GPU全局内存中维护辅助数据结构的内存高效框架。我们在十个数据集上全面评估了FlowWalker的性能，实验结果表明，与现有的CPU、GPU和FPGA随机游走框架相比，FlowWalker分别实现了最高752.2倍、72.1倍和16.4倍的加速。案例研究表明，在字节跳动好友推荐GNN训练的流水线中，FlowWalker将随机游走时间从35%降低至3%。

相关内容

随机漫步

关注 1

在数学中，随机漫步是一种数学对象，称为随机过程或随机过程，它描述的路径由在某些数学空间（例如整数）上的一系列随机步骤组成。随机行走等是指基于过去的表现，无法预测将来的发展步骤和方向。核心概念是指任何无规则行走者所带的守恒量都各自对应着一个扩散运输定律，接近于布朗运动，是布朗运动理想的数学状态，现阶段主要应用于互联网链接分析及金融股票市场中。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日