Graph Neural Networks (GNNs) have superior capability in learning graph data. Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage and encounters the Out-of-Memory problem when handling large graphs. To address this memory problem, a popular solution is mini-batch GNN training. However, mini-batch GNN training increases the training variance and sacrifices the model accuracy. In this paper, we propose a new memory-efficient GNN training method using spanning subgraph, called SpanGNN. SpanGNN trains GNN models over a sequence of spanning subgraphs, which are constructed from empty structure. To overcome the excessive peak memory consumption problem, SpanGNN selects a set of edges from the original graph to incrementally update the spanning subgraph between every epoch. To ensure the model accuracy, we introduce two types of edge sampling strategies (i.e., variance-reduced and noise-reduced), and help SpanGNN select high-quality edges for the GNN learning. We conduct experiments with SpanGNN on widely used datasets, demonstrating SpanGNN's advantages in the model performance and low peak memory usage.
翻译:图神经网络(GNN)在学习图数据方面具有卓越能力。全图GNN训练通常具有高精度,但在处理大规模图时会面临峰值内存占用过大及内存溢出问题。为解决该内存问题,常用方案是采用小批量GNN训练,然而这会增加训练方差并牺牲模型精度。本文提出一种基于生成子图的新型记忆高效GNN训练方法SpanGNN。SpanGNN在由空结构逐步构建的生成子图序列上训练GNN模型。为克服峰值内存消耗过大问题,SpanGNN每轮训练从原始图中选择边集增量更新生成子图。为确保模型精度,我们引入两种边采样策略(方差缩减与噪声缩减),帮助SpanGNN为GNN学习选取高质量边。我们在广泛使用的数据集上开展实验,证明SpanGNN在模型性能与低峰值内存使用方面的优势。