We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms, enhancing their ability to capture long-range and local temporal patterns in time series. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that incorporating the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.
翻译:本文提出了一种自回归移动平均注意力结构,该结构能够适配多种线性注意力机制,增强其对时间序列中长期与局部时序模式的捕捉能力。我们首先证明,在时间序列预测任务中,先前被忽视的仅解码器自回归Transformer模型,在采用适当的标记化与训练方法后,能够取得与最佳基线模型相当的结果。进一步地,受统计学中的ARMA模型以及近期线性注意力进展的启发,我们将完整的ARMA结构引入现有的自回归注意力机制中。通过使用间接的移动平均权重生成方法,我们在保持底层高效注意力模型的时间复杂度与参数规模的同时,融入了移动平均项。我们还深入探讨了间接参数生成如何产生符合局部时序影响建模需求的隐式移动平均权重。实验结果表明,引入ARMA结构能够持续提升多种自回归注意力机制在时间序列预测任务上的性能,并取得了最先进的结果。