Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).
翻译:Transformer因能捕捉长序列交互已在时间序列预测领域广受欢迎。然而,其高内存和计算需求成为长期预测的关键瓶颈。为此,我们提出TSMixer——一种完全由多层感知机(MLP)模块构成的轻量级神经网络架构。TSMixer专为对分块时间序列进行多变量预测与表征学习而设计,为Transformer提供了高效替代方案。该模型受计算机视觉中MLP-Mixer模型成功的启发。我们揭示了将视觉MLP-Mixer适配至时间序列所面临的挑战,并引入经实证验证的组件以提升精度,包括一种在MLP-Mixer主干上附加在线协调头的新颖设计范式,用于显式建模层级结构和通道相关性等时间序列特性。针对现有分块通道混合方法中难以处理噪声通道交互及跨数据集泛化的常见难题,我们提出了一种混合通道建模方法。此外,在主干中引入简单门控注意力机制以优先处理重要特征。通过整合这些轻量级组件,我们显著增强了简单MLP结构的学习能力,以极低计算量超越复杂Transformer模型。更具价值的是,TSMixer的模块化设计使其兼容监督学习与掩码自监督学习方法,有望成为时间序列基础模型的基础构建模块。TSMixer在预测任务上以8%-60%的显著优势超越当前最先进的MLP与Transformer模型,且相比最新的Patch-Transformer强基准模型,在降低1%-2%预测误差的同时实现2-3倍的内存与运行时间缩减。