We propose a Weighted Autoregressive Varying gatE (WAVE) attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components. It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that WAVE attention that incorporates the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.
翻译:我们提出了一种配备自回归(AR)和移动平均(MA)组件的加权自回归可变门控(WAVE)注意力机制。该机制能够适配多种注意力机制,增强并解耦其捕捉时间序列数据中长期与局部时序模式的能力。本文首先证明,对于时间序列预测任务,先前被忽视的仅解码器自回归Transformer模型,在应用适当的标记化与训练方法后,能够取得与最佳基线模型相当的结果。此外,受统计学中的ARMA模型以及线性注意力最新进展的启发,我们将完整的ARMA结构引入现有的自回归注意力机制中。通过使用间接的MA权重生成方法,我们在保持底层高效注意力模型的时间复杂度与参数量的同时,融入了MA项。我们进一步探讨了间接参数生成如何产生符合局部时序影响建模需求的隐式MA权重。实验结果表明,融入ARMA结构的WAVE注意力能够持续提升多种AR注意力在时间序列预测任务上的性能,并取得了最先进的结果。