Covariates play an indispensable role in practical time series forecasting, offering rich context from the past and sometimes extending into the future. However, their availability varies depending on the scenario, and situations often involve multiple target variables simultaneously. Moreover, the cross-variate dependencies between them are multi-granular, with some covariates having a short-term impact on target variables and others showing long-term correlations. This heterogeneity and the intricate dependencies arising in covariate-informed forecasting present significant challenges to existing deep models. To address these issues, we propose CITRAS, a patch-based Transformer that flexibly leverages multiple targets and covariates covering both the past and the future forecasting horizon. While preserving the strong autoregressive capabilities of the canonical Transformer, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future known covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing transforms locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS achieves state-of-the-art performance in both covariate-informed and multivariate forecasting, demonstrating its versatile ability to leverage cross-variate dependency for improved forecasting accuracy.
翻译:协变量在实际时间序列预测中发挥着不可或缺的作用,提供了来自过去的丰富上下文信息,有时甚至延伸至未来。然而,其可用性因场景而异,且实际情况常涉及多个目标变量同时预测。此外,协变量与目标变量之间的跨变量依赖关系具有多粒度特性:部分协变量对目标变量产生短期影响,而另一些则表现出长期相关性。这种异质性以及协变量感知预测中产生的复杂依赖关系,对现有深度模型构成了重大挑战。为解决这些问题,我们提出了CITRAS——一种基于分片的Transformer模型,能够灵活利用覆盖过去及未来预测时域的多个目标变量与协变量。在保持经典Transformer强大自回归能力的同时,CITRAS在分片级跨变量注意力机制中引入了两项创新:键值偏移与注意力分数平滑。键值偏移基于协变量与目标变量的并发依赖关系,将未来已知协变量无缝融入目标变量的预测过程。此外,注意力分数平滑通过对历史注意力分数序列进行平滑处理,将局部精确的分片级跨变量依赖关系转化为全局变量级依赖关系。实验表明,CITRAS在协变量感知预测与多变量预测任务中均取得了最先进的性能,证明了其利用跨变量依赖关系提升预测精度的强大泛化能力。