TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series

Large Pretrained models for Zero/Few-shot learning excel in language and vision domains but encounter challenges in multivariate time series (TS) due to the diverse nature and scarcity of publicly available pretraining data. Consequently, there has been a recent surge in utilizing pretrained large language models (LLMs) with various adaptations for time series forecasting. These approaches employ cross-domain transfer learning, yielding highly impressive results. However, these models are typically very large ($\sim$ billion parameters), exhibit slow execution, and do not consider cross-channel correlations. To address this, we present Multi-level Tiny Time Mixers (TTM), a significantly smaller model based on the lightweight TSMixer architecture. TTM marks the first success in developing tiny pretrained models ($\le$1 million parameters), exclusively trained on public TS data with effective transfer learning capabilities. To tackle the complexity of pretraining on multiple datasets with varied temporal resolutions, we introduce several novel enhancements such as adaptive patching, dataset augmentation via downsampling, and resolution prefix tuning. Moreover, we employ a multi-level modeling strategy to effectively model channel correlations and incorporate exogenous signals during finetuning, a crucial capability lacking in existing benchmarks. TTM excels in few/zero-shot forecasting, demonstrating significant accuracy gains (12-38%) over existing benchmarks. Further, it achieves a remarkable 14-106X reduction in model parameters, enabling 54-65X faster training/inference as compared to the LLM-TS benchmarks. In fact, TTM's zero-shot results often surpass the few-shot results in many benchmarks, highlighting the efficacy of our approach. Code and Pretrained Models will be open-sourced.

翻译：大型预训练模型在零样本/小样本学习中在语言和视觉领域表现出色，但因多变量时间序列的多样性和公开预训练数据的稀缺性而面临挑战。为此，近期涌现出利用预训练大型语言模型（LLM）适配时间序列预测的研究，通过跨域迁移学习取得显著成果。然而，这些模型通常规模庞大（约十亿参数）、执行缓慢且未考虑跨通道相关性。针对上述问题，我们提出基于轻量级TSMixer架构的微型时间混合器（TTM）——首个参数不超过100万的微型预训练模型，仅使用公开时间序列数据训练并具备有效迁移学习能力。为应对多数据集不同时间分辨率的预训练复杂性，我们引入自适应分块、下采样数据增强和分辨率前缀调优等创新技术。同时采用多层建模策略有效建模通道相关性，并在微调中整合外生信号——这一关键能力在现有基准中缺失。TTM在零样本/小样本预测中表现卓越，相比现有基准实现12-38%的精度提升；模型参数减少14-106倍，训练/推理速度提升54-65倍。事实上，TTM的零样本结果在许多基准中超越了小样本结果，凸显了方法的有效性。代码与预训练模型将开源。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日