Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules for multivariate forecasting and representation learning on patched time series. Inspired by MLP-Mixer's success in computer vision, we adapt it for time series, addressing challenges and introducing validated components for enhanced accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a novel Hybrid channel modeling and infusion of a simple gating approach to effectively handle noisy channel interactions and generalization across diverse datasets. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X). The source code of our model is officially released as PatchTSMixer in the HuggingFace. Model: https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer Examples: https://github.com/ibm/tsfm/#notebooks-links
翻译:Transformer因其捕获长序列交互的能力而在时间序列预测领域广受欢迎。然而,其高内存和计算需求成为长期预测的关键瓶颈。为解决这一问题,我们提出TSMixer——一种完全由多层感知机模块组成的轻量级神经网络架构,专用于分块时间序列的多变量预测与表征学习。受MLP-Mixer在计算机视觉领域成功经验的启发,我们将其适配至时间序列任务,攻克相关挑战并引入经验证的组件以提升精度。其中包含一种创新设计范式:在MLP-Mixer骨干网络上附加在线协调头,用于显式建模层级结构、通道相关性等时间序列特性。我们还提出新颖的混合通道建模方法,并融入简易门控机制,以有效处理含噪通道交互并增强跨数据集的泛化能力。通过集成这些轻量级组件,我们显著提升了简易MLP结构的学习能力,以极低计算开销超越复杂Transformer模型。此外,TSMixer的模块化设计支持有监督与掩码自监督学习方法,使其成为时间序列基础模型极具潜力的构建模块。TSMixer在预测性能上以8-60%的显著优势超越当前最先进的MLP和Transformer模型,同时在内存和运行时间减少2-3倍的情况下,仍以1-2%的精度优势超越最新的强基准Patch-Transformer模型。本模型源代码已作为PatchTSMixer在HuggingFace官方发布。模型链接:https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer 示例链接:https://github.com/ibm/tsfm/#notebooks-links