Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, training MDGNNs faces the challenge of handling entangled temporal and structural dependencies, requiring sequential and chronological processing of data sequences to capture accurate temporal patterns. During the batch training, the temporal data points within the same batch will be processed in parallel, while their temporal dependencies are neglected. This issue is referred to as temporal discontinuity and restricts the effective temporal batch size, limiting data parallelism and reducing MDGNNs' flexibility in industrial applications. This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes. We first conduct a theoretical study on the impact of temporal batch size on the convergence of MDGNN training. Based on the analysis, we propose PRES, an iterative prediction-correction scheme combined with a memory coherence learning objective to mitigate the effect of temporal discontinuity, enabling MDGNNs to be trained with significantly larger temporal batches without sacrificing generalization performance. Experimental results demonstrate that our approach enables up to a 4x larger temporal batch (3.4x speed-up) during MDGNN training.
翻译:基于记忆的动态图神经网络(MDGNNs)是一类利用记忆模块提取、提炼和记忆长期时序依赖关系的动态图神经网络,相较于无记忆方法具有更优性能。然而,训练MDGNNs面临处理纠缠的时序与结构依赖关系的挑战,需要按时间顺序处理数据序列以捕获精确的时序模式。在批训练过程中,同一批次内的时序数据点会被并行处理,而其时序依赖关系则被忽略。此问题称为时序不连续性,它会限制有效时序批大小,降低数据并行度并削弱MDGNNs在工业应用中的灵活性。本文研究大规模MDGNNs的高效训练,重点关注使用大时序批大小训练MDGNNs时的时序不连续性问题。我们首先对时序批大小影响MDGNN训练收敛性进行理论分析。基于该分析,我们提出PRES——一种结合记忆一致性学习目标的迭代预测-校正方案,以缓解时序不连续性的影响,使MDGNNs能够在不牺牲泛化性能的前提下使用显著更大的时序批进行训练。实验结果表明,我们的方法在MDGNN训练中可实现高达4倍的时序批大小(3.4倍加速)。