TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

from arxiv, Accepted in the Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 23), Research Track. Delayed release in arXiv to comply with the conference policies on the double-blind review process. This paper has been submitted to the KDD peer-review process on Feb 02, 2023

Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).

翻译：Transformer因其捕获长序列交互的能力在时间序列预测中广受关注。然而，其高内存与计算需求对长期预测构成了关键瓶颈。为此，我们提出TSMixer——一种完全由多层感知机（MLP）模块组成的轻量级神经架构。TSMixer专为分块时间序列的多变量预测与表示学习而设计，为Transformer提供了高效替代方案。我们的模型借鉴了MLP-Mixer模型在计算机视觉领域的成功经验，揭示了将视觉MLP-Mixer应用于时间序列的挑战，并引入经实证验证的组件以提升精度。这包括一种新颖的设计范式：在MLP-Mixer主干上附加在线协调头，显式建模层级结构与通道相关性等时间序列特性。我们还提出混合通道建模方法，以有效处理嘈杂的通道交互及跨数据集的泛化问题——这是现有通道分块混合方法中的常见挑战。此外，主干中引入的简单门控注意力机制可优先处理重要特征。通过融入这些轻量化组件，我们显著增强了简单MLP结构的学习能力，以极低计算开销超越复杂Transformer模型。TSMixer的模块化设计还兼容监督学习与掩码自监督学习方法，使其成为时间序列基础模型的有前景构建模块。TSMixer在预测性能上以8-60%的显著优势超越当前最先进的MLP和Transformer模型；同时以大幅降低的内存与运行时间（2-3倍）超越最新强基准Patch-Transformer模型（提升1-2%）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日