Autoregressive attention-based time series forecasting (TSF) has drawn increasing interest, with mechanisms like linear attention sometimes outperforming vanilla attention. However, deeper Transformer architectures frequently misalign with autoregressive objectives, obscuring the underlying VAR structure embedded within linear attention and hindering their ability to capture the data generative processes in TSF. In this work, we first show that a single linear attention layer can be interpreted as a dynamic vector autoregressive (VAR) structure. We then explain that existing multi-layer Transformers have structural mismatches with the autoregressive forecasting objective, which impair interpretability and generalization ability. To address this, we show that by rearranging the MLP, attention, and input-output flow, multi-layer linear attention can also be aligned as a VAR model. Then, we propose Structural Aligned Mixture of VAR (SAMoVAR), a linear Transformer variant that integrates interpretable dynamic VAR weights for multivariate TSF. By aligning the Transformer architecture with autoregressive objectives, SAMoVAR delivers improved performance, interpretability, and computational efficiency, comparing to SOTA TSF models.
翻译:基于自回归注意力的时间序列预测日益受到关注,其中线性注意力等机制有时优于原始注意力。然而,更深的Transformer架构常常与自回归目标不匹配,这模糊了线性注意力中嵌入的底层VAR结构,并阻碍了其捕捉TSF中数据生成过程的能力。在本工作中,我们首先证明单层线性注意力可被解释为一种动态向量自回归结构。随后,我们阐释现有的多层Transformer存在与自回归预测目标的结构失配问题,这损害了模型的可解释性与泛化能力。为解决此问题,我们证明通过重新排列MLP、注意力及输入输出流,多层线性注意力同样可被对齐为VAR模型。基于此,我们提出了结构对齐的VAR混合模型,这是一种线性Transformer变体,集成了可解释的动态VAR权重以用于多元TSF。通过将Transformer架构与自回归目标对齐,相较于当前最先进的TSF模型,SAMoVAR在性能、可解释性与计算效率方面均实现了提升。