Graph Neural Networks (GNNs) play a crucial role in various fields. However, most existing deep graph learning frameworks assume pre-stored static graphs and do not support training on graph streams. In contrast, many real-world graphs are dynamic and contain time domain information. We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learning on dynamic graphs on multi-GPU machines. GNNFlow introduces an adaptive time-indexed block-based data structure that effectively balances memory usage with graph update and sampling operation efficiency. It features a hybrid GPU-CPU graph data placement for rapid GPU-based temporal neighborhood sampling and kernel optimizations for enhanced sampling processes. A dynamic GPU cache for node and edge features is developed to maximize cache hit rates through reuse and restoration strategies. GNNFlow supports distributed training across multiple machines with static scheduling to ensure load balance. We implement GNNFlow based on DGL and PyTorch. Our experimental results show that GNNFlow provides up to 21.1x faster continuous learning than existing systems.
翻译:图神经网络(GNN)在各领域中发挥着至关重要的作用。然而,现有的大多数深度图学习框架假设预存储的静态图,不支持对图流(graph streams)的训练。相比之下,许多真实世界的图是动态的并包含时域信息。我们提出GNNFlow,一个面向多GPU机器上动态图的高效连续时序图表示学习的分布式框架。GNNFlow引入了一种自适应时间索引的块(block)数据结构,有效平衡了内存使用与图更新及采样操作的效率。它采用混合GPU-CPU图数据放置策略,实现基于GPU的快速时序邻域采样,并通过内核优化增强采样过程。我们还开发了一种面向节点和边特征的动态GPU缓存,通过重用与恢复策略最大化缓存命中率。GNNFlow通过静态调度支持多机分布式训练,确保负载均衡。我们基于DGL和PyTorch实现了GNNFlow。实验结果表明,与现有系统相比,GNNFlow的连续学习速度最高提升21.1倍。