Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between training samples, the training of GNNs should address distinct challenges different from DNN training in data management, such as data partitioning, batch preparation for mini-batch training, and data transferring between CPUs and GPUs. These factors, which take up a large proportion of training time, make data management in GNN training more significant. This paper reviews GNN training from a data management perspective and provides a comprehensive analysis and evaluation of the representative approaches. We conduct extensive experiments on various benchmark datasets and show many interesting and valuable results. We also provide some practical tips learned from these experiments, which are helpful for designing GNN training systems in the future.
翻译:近年来涌现出众多图神经网络(GNN)训练系统,以支持高效的GNN训练。由于GNN在训练样本之间体现了复杂的数据依赖关系,其训练需应对与深度神经网络(DNN)训练在数据管理方面的显著差异,例如数据划分、小批量训练中的批次准备,以及CPU与GPU间的数据传输。这些因素在训练时间中占据极大比重,使得数据管理在GNN训练中更为关键。本文从数据管理视角审视GNN训练,对代表性方法进行了全面分析与评估。我们基于多个基准数据集开展了广泛实验,揭示了大量有趣且有价值的发现。此外,从这些实验中总结出若干实用建议,为未来GNN训练系统的设计提供有益参考。