iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformers are challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the embedding for each temporal token fuses multiple variates that represent potential delayed events and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any modification to the basic components. We propose iTransformer that simply applies the attention and feed-forward network on the inverted dimensions. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves state-of-the-art on challenging real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting. Code is available at this repository: https://github.com/thuml/iTransformer.

翻译：近期线性预测模型的兴起，对基于Transformer的预测器在架构修改方面的持续热情提出了质疑。这类预测器利用Transformer建模时间序列中时间标记的全局依赖关系，每个标记由同一时间戳的多个变量构成。然而，Transformer在预测具有更大回溯窗口的序列时面临性能下降和计算爆炸的挑战。此外，每个时间标记的嵌入融合了多个变量，这些变量可能代表延迟事件和不同的物理测量，这可能导致无法学习以变量为中心的表示，并产生无意义的注意力图。本研究重新审视了Transformer各组件的职责，并在不修改基础组件的情况下重新设计了Transformer架构。我们提出了iTransformer，该模型简单地将注意力机制和前馈网络应用于倒置的维度。具体而言，将单个序列的时间点嵌入为变量标记，注意力机制利用这些标记捕捉多变量相关性；同时，对每个变量标记应用前馈网络学习非线性表示。iTransformer模型在具有挑战性的真实数据集上达到了最先进的性能，进一步提升了Transformer系列模型的预测能力、跨不同变量的泛化能力以及对任意回溯窗口的利用效率，使其成为时间序列预测中基础骨干网络的理想选择。代码已开源：https://github.com/thuml/iTransformer。