Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between training samples, the training of GNNs should address distinct challenges different from DNN training in data management, such as data partitioning, batch preparation for mini-batch training, and data transferring between CPUs and GPUs. These factors, which take up a large proportion of training time, make data management in GNN training more significant. This paper reviews GNN training from a data management perspective and provides a comprehensive analysis and evaluation of the representative approaches. We conduct extensive experiments on various benchmark datasets and show many interesting and valuable results. We also provide some practical tips learned from these experiments, which are helpful for designing GNN training systems in the future.
翻译:近年来涌现出许多图神经网络(GNN)训练系统以支持高效的GNN训练。由于GNN在训练样本之间体现了复杂的数据依赖关系,其训练需在数据管理层面解决不同于深度神经网络训练的独特挑战,例如数据分区、小批量训练的批次准备以及CPU与GPU间的数据传输。这些因素占据了训练时间的较大比例,使得GNN训练中的数据管理更为关键。本文从数据管理视角系统审视GNN训练,对代表性方法进行了全面分析与评估。我们在多种基准数据集上开展了大量实验,揭示了诸多有趣且有价值的结论。此外,我们还总结了从实验中获得的实用经验,这些经验对未来设计GNN训练系统具有重要参考价值。