NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams

Existing Graph Neural Network (GNN) training frameworks have been designed to help developers easily create performant GNN implementations. However, most existing GNN frameworks assume that the input graphs are static, but ignore that most real-world graphs are constantly evolving. Though many dynamic GNN models have emerged to learn from evolving graphs, the training process of these dynamic GNNs is dramatically different from traditional GNNs in that it captures both the spatial and temporal dependencies of graph updates. This poses new challenges for designing dynamic GNN training frameworks. First, the traditional batched training method fails to capture real-time structural evolution information. Second, the time-dependent nature makes parallel training hard to design. Third, it lacks system supports for users to efficiently implement dynamic GNNs. In this paper, we present NeutronStream, a framework for training dynamic GNN models. NeutronStream abstracts the input dynamic graph into a chronologically updated stream of events and processes the stream with an optimized sliding window to incrementally capture the spatial-temporal dependencies of events. Furthermore, NeutronStream provides a parallel execution engine to tackle the sequential event processing challenge to achieve high performance. NeutronStream also integrates a built-in graph storage structure that supports dynamic updates and provides a set of easy-to-use APIs that allow users to express their dynamic GNNs. Our experimental results demonstrate that, compared to state-of-the-art dynamic GNN implementations, NeutronStream achieves speedups ranging from 1.48X to 5.87X and an average accuracy improvement of 3.97%.

翻译：现有的图神经网络训练框架旨在帮助开发者轻松实现高性能的GNN模型。然而，大多数现有GNN框架假设输入图为静态图，忽略了现实世界中图数据持续演变的特性。尽管已有许多动态GNN模型用于学习演化图数据，但这类模型的训练过程与传统GNN存在显著差异——它需要同时捕获图更新的空间依赖与时序依赖。这给动态GNN训练框架的设计带来了新挑战：首先，传统批训练方法无法捕获实时结构演化信息；其次，时间依赖性导致并行训练设计困难；最后，缺乏支持用户高效实现动态GNN的系统工具。本文提出NeutronStream——一个动态GNN模型训练框架。该框架将输入的动态图抽象为按时间顺序更新的事件流，并通过优化后的滑动窗口增量式捕获事件的时空依赖关系。此外，NeutronStream提供并行执行引擎以解决顺序事件处理难题，实现高性能计算。框架还集成了支持动态更新的内置图存储结构，并提供简洁易用的API接口，便于用户表达动态GNN模型。实验结果表明，与当前最先进的动态GNN实现相比，NeutronStream实现了1.48倍至5.87倍的加速比，平均准确率提升3.97%。