In this paper, we propose an online algorithm mspace for forecasting node features in temporal graphs, which captures spatial cross-correlation among different nodes as well as the temporal auto-correlation within a node. The algorithm can be used for both probabilistic and deterministic multi-step forecasting, making it applicable for estimation and generation tasks. Comparative evaluations against various baselines, including temporal graph neural network (TGNN) models and classical Kalman filters, demonstrate that mspace performs at par with the state-of-the-art and even surpasses them on some datasets. Importantly, mspace demonstrates consistent performance across datasets with varying training sizes, a notable advantage over TGNN models that require abundant training samples to effectively learn the spatiotemporal trends in the data. Therefore, employing mspace is advantageous in scenarios where the training sample availability is limited. Additionally, we establish theoretical bounds on multi-step forecasting error of mspace and show that it scales linearly with the number of forecast steps $q$ as $\mathcal{O}(q)$. For an asymptotically large number of nodes $n$, and timesteps $T$, the computational complexity of mspace grows linearly with both $n$, and $T$, i.e., $\mathcal{O}(nT)$, while its space complexity remains constant $\mathcal{O}(1)$. We compare the performance of various mspace variants against ten recent TGNN baselines and two classical baselines, ARIMA and the Kalman filter across ten real-world datasets. Additionally, we propose a technique to generate synthetic datasets to aid in evaluating node feature forecasting methods, with the potential to serve as a benchmark for future research. Lastly, we have investigate the interpretability of different mspace variants by analyzing model parameters alongside dataset characteristics to derive model and data-centric insights.
翻译:本文提出了一种用于时序图中节点特征预测的在线算法 mspace,该算法能够捕捉不同节点间的空间互相关性以及节点内部的时间自相关性。该算法既可用于概率性多步预测,也可用于确定性多步预测,使其适用于估计和生成任务。与包括时序图神经网络(TGNN)模型和经典卡尔曼滤波器在内的多种基线方法进行的比较评估表明,mspace 的性能与最先进方法相当,甚至在某些数据集上超越它们。重要的是,mspace 在不同训练规模的数据集上均表现出稳定的性能,这相对于需要大量训练样本才能有效学习数据中时空趋势的 TGNN 模型而言是一个显著优势。因此,在训练样本可用性有限的场景中,采用 mspace 具有优势。此外,我们建立了 mspace 多步预测误差的理论界,并证明其随预测步数 $q$ 呈线性缩放,即 $\mathcal{O}(q)$。对于渐近大的节点数 $n$ 和时间步数 $T$,mspace 的计算复杂度随 $n$ 和 $T$ 线性增长,即 $\mathcal{O}(nT)$,而其空间复杂度保持恒定 $\mathcal{O}(1)$。我们在十个真实世界数据集上,比较了多种 mspace 变体与十个近期 TGNN 基线以及两个经典基线(ARIMA 和卡尔曼滤波器)的性能。此外,我们提出了一种生成合成数据集的技术,以辅助评估节点特征预测方法,并有望作为未来研究的基准。最后,我们通过分析模型参数与数据集特征,研究了不同 mspace 变体的可解释性,从而得出以模型和数据为中心的见解。