Temporal Graph Neural Networks (TGNs) achieve state-of-the-art performance on dynamic graph tasks, yet existing systems focus exclusively on accelerating training -- at inference time, every new edge triggers $O(|V|)$ embedding updates even though only a small fraction of nodes are affected. We present \textbf{StreamTGN}, the first streaming TGN inference system exploiting the inherent locality of temporal graph updates: in an $L$-layer TGN, a new edge affects only nodes within $L$ hops of the endpoints, typically less than 0.2\% on million-node graphs. StreamTGN maintains persistent GPU-resident node memory and uses dirty-flag propagation to identify the affected set $\mathcal{A}$, reducing per-batch complexity from $O(|V|)$ to $O(|\mathcal{A}|)$ with zero accuracy loss. Drift-aware adaptive rebuild scheduling and batched streaming with relaxed ordering further maximize throughput. Experiments on eight temporal graphs (2K--2.6M nodes) show 4.5$\times$--739$\times$ speedup for TGN and up to 4,207$\times$ for TGAT, with identical accuracy. StreamTGN is orthogonal to training optimizations: combining SWIFT with StreamTGN yields 24$\times$ end-to-end speedup across three architectures (TGN, TGAT, DySAT).
翻译:[translated abstract in Chinese]
时序图神经网络(TGNs)在动态图任务中取得了最先进的性能,然而现有系统仅专注于加速训练过程——在推理阶段,每条新边都会触发$O(|V|)$量级的嵌入更新,尽管实际受影响的节点仅占极小比例。本文提出\textbf{StreamTGN},这是首个利用时序图更新固有局部性的流式TGN推理系统:在$L$层TGN中,新边仅影响端点$L$跳范围内的节点,对于百万节点规模的图,这一比例通常低于0.2%。StreamTGN维护持久驻留GPU的节点内存,并通过脏标志传播机制识别受影响节点集$\mathcal{A}$,将每批处理复杂度从$O(|V|)$降至$O(|\mathcal{A}|)$,且保证零精度损失。通过漂移感知的自适应重建调度与松弛排序的批量流式处理,进一步提升了吞吐量。在八个时序图(含2K至2.6M节点)上的实验表明:TGN获得4.5倍至739倍加速,TGAT最高获得4207倍加速,且精度保持一致。StreamTGN与训练优化方法正交:将SWIFT与StreamTGN结合,可在TGN、TGAT、DySAT三种架构上实现24倍的端到端加速。