Graph Neural Network (GNN) on streaming graphs has gained increasing popularity. However, its practical deployment remains challenging, as the inference process relies on Runtime Embedding Computation (RTEC) to capture recent graph changes. This process incurs heavyweight multi-hop graph traversal overhead, which significantly undermines computation efficiency. We observe that the intermediate results for large portions of the graph remain unchanged during graph evolution, and thus redundant computations can be effectively eliminated through carefully designed incremental methods. In this work, we propose an efficient framework for incrementalizing RTEC on streaming graphs.The key idea is to decouple GNN computation into a set of generalized, fine-grained operators and safely reorder them, transforming the expensive full-neighbor GNN computation into a more efficient form over the affected subgraph. With this design, our framework preserves the semantics and accuracy of the original full-neighbor computation while supporting a wide range of GNN models with complex message-passing patterns. To further scale to graphs with massive historical results, we develop a GPU-CPU co-processing system that offloads embeddings to CPU memory with communication-optimized scheduling. Experiments across diverse graph sizes and GNN models show that our method reduces computation by 64%-99% and achieves 1.7x-145.8x speedups over existing solutions.
翻译:流式图上的图神经网络(GNN)日益受到关注。然而,其实际部署仍面临挑战,因为推理过程依赖运行时嵌入计算(RTEC)来捕获最新的图变化。该过程会产生繁重的多跳图遍历开销,严重降低计算效率。我们观察到,在图演化过程中,大部分图的中间结果保持不变,因此可以通过精心设计的增量方法有效消除冗余计算。本文提出一个高效框架,用于对流式图上的RTEC进行增量优化。其核心思想是将GNN计算解耦为一组通用的细粒度算子,并通过安全重排序,将昂贵的全邻居GNN计算转化为受影响子图上的更高效形式。基于此设计,我们的框架在保留原始全邻居计算的语义与精度的同时,支持具有复杂消息传递模式的多种GNN模型。为进一步扩展到海量历史结果的图,我们开发了一种GPU-CPU协同处理系统,通过通信优化调度将嵌入卸载至CPU内存。在不同图规模和GNN模型上的实验表明,我们的方法减少64%-99%的计算量,并在现有方案基础上实现1.7倍至145.8倍的加速。