Transformers have achieved remarkable performance in multivariate time series(MTS) forecasting due to their capability to capture long-term dependencies. However, the canonical attention mechanism has two key limitations: (1) its quadratic time complexity limits the sequence length, and (2) it generates future values from the entire historical sequence. To address this, we propose a Dozer Attention mechanism consisting of three sparse components: (1) Local, each query exclusively attends to keys within a localized window of neighboring time steps. (2) Stride, enables each query to attend to keys at predefined intervals. (3) Vary, allows queries to selectively attend to keys from a subset of the historical sequence. Notably, the size of this subset dynamically expands as forecasting horizons extend. Those three components are designed to capture essential attributes of MTS data, including locality, seasonality, and global temporal dependencies. Additionally, we present the Dozerformer Framework, incorporating the Dozer Attention mechanism for the MTS forecasting task. We evaluated the proposed Dozerformer framework with recent state-of-the-art methods on nine benchmark datasets and confirmed its superior performance. The code will be released after the manuscript is accepted.
翻译:Transformer凭借其捕捉长期依赖关系的能力,在多元时间序列(MTS)预测任务中取得了显著成效。然而,标准注意力机制存在两个关键局限:(1)其二次时间复杂度限制了序列长度;(2)它从整个历史序列生成未来值。为解决这一问题,我们提出包含三种稀疏成分的Dozer注意力机制:(1)局部注意力,每个查询仅关注邻近时间步局部窗口内的键;(2)步长注意力,使每个查询能够关注预设间隔内的键;(3)可变注意力,允许查询有选择地关注历史序列子集中的键。值得注意的是,该子集大小会随着预测视界的延伸而动态扩大。这三种成分旨在捕捉MTS数据的关键属性,包括局部性、季节性和全局时间依赖性。此外,我们提出了融合Dozer注意力机制的Dozerformer框架,用于MTS预测任务。我们在九个基准数据集上将所提出的Dozerformer框架与近期最先进方法进行了评估,并证实了其优越性能。代码将在论文接收后公开发布。