Multivariate time series forecasting has been widely used in various practical scenarios. Recently, Transformer-based models have shown significant potential in forecasting tasks due to the capture of long-range dependencies. However, recent studies in the vision and NLP fields show that the role of attention modules is not clear, which can be replaced by other token aggregation operations. This paper investigates the contributions and deficiencies of attention mechanisms on the performance of time series forecasting. Specifically, we find that (1) attention is not necessary for capturing temporal dependencies, (2) the entanglement and redundancy in the capture of temporal and channel interaction affect the forecasting performance, and (3) it is important to model the mapping between the input and the prediction sequence. To this end, we propose MTS-Mixers, which use two factorized modules to capture temporal and channel dependencies. Experimental results on several real-world datasets show that MTS-Mixers outperform existing Transformer-based models with higher efficiency.
翻译:多变量时间序列预测已广泛应用于各类实际场景。近年来,基于Transformer的模型因能捕捉长程依赖关系而在预测任务中展现出显著潜力。然而,视觉与自然语言处理领域的最新研究表明,注意力模块的作用尚不明确,其可被其他令牌聚合操作替代。本文探究注意力机制对时间序列预测性能的贡献与不足。具体而言,我们发现:(1)注意力并非捕捉时序依赖的必要条件;(2)时序与通道交互捕捉中的纠缠与冗余会影响预测性能;(3)构建输入与预测序列之间的映射关系至关重要。为此,我们提出MTS-Mixers模型,采用两个分解模块分别捕捉时序依赖与通道依赖。在多个真实数据集上的实验结果表明,MTS-Mixers以更高效率超越了现有基于Transformer的模型。