Dynamic Graph Neural Network (DGNN) has shown a strong capability of learning dynamic graphs by exploiting both spatial and temporal features. Although DGNN has recently received considerable attention by AI community and various DGNN models have been proposed, building a distributed system for efficient DGNN training is still challenging. It has been well recognized that how to partition the dynamic graph and assign workloads to multiple GPUs plays a critical role in training acceleration. Existing works partition a dynamic graph into snapshots or temporal sequences, which only work well when the graph has uniform spatio-temporal structures. However, dynamic graphs in practice are not uniformly structured, with some snapshots being very dense while others are sparse. To address this issue, we propose DGC, a distributed DGNN training system that achieves a 1.25x - 7.52x speedup over the state-of-the-art in our testbed. DGC's success stems from a new graph partitioning method that partitions dynamic graphs into chunks, which are essentially subgraphs with modest training workloads and few inter connections. This partitioning algorithm is based on graph coarsening, which can run very fast on large graphs. In addition, DGC has a highly efficient run-time, powered by the proposed chunk fusion and adaptive stale aggregation techniques. Extensive experimental results on 3 typical DGNN models and 4 popular dynamic graph datasets are presented to show the effectiveness of DGC.
翻译:摘要:动态图神经网络(DGNN)通过同时利用空间与时间特征,展现出学习动态图的强大能力。尽管DGNN近期受到人工智能界的广泛关注且多种DGNN模型已被提出,但构建用于高效DGNN训练的分布式系统仍具有挑战性。现有研究已充分证实,如何划分动态图并将工作负载分配给多个GPU对训练加速至关重要。当前方法将动态图划分为快照或时间序列,仅适用于图结构时空分布均匀的场景。然而,实际动态图往往呈现非均匀结构——部分快照稠密而其他快照稀疏。为解决该问题,本文提出DGC分布式DGNN训练系统,在测试环境中相比当前最优方法实现1.25倍至7.52倍的加速效果。DGC的核心创新在于新型图划分方法:将动态图划分为"块"(chunk),即训练负载适中且连接稀少的子图。该划分算法基于图粗化技术,可在大规模图上高速运行。此外,DGC通过所提出的分块融合与自适应陈旧聚合技术,构建了高效的运行时系统。在3种典型DGNN模型与4个主流动态图数据集上的大量实验结果验证了DGC的有效性。