Large Pretrained models for Zero/Few-shot learning excel in language and vision domains but encounter challenges in multivariate time series (TS) due to the diverse nature and scarcity of publicly available pretraining data. Consequently, there has been a recent surge in utilizing pretrained large language models (LLMs) with various adaptations for time series forecasting. These approaches employ cross-domain transfer learning, yielding highly impressive results. However, these models are typically very large ($\sim$ billion parameters), exhibit slow execution, and do not consider cross-channel correlations. To address this, we present Multi-level Tiny Time Mixers (TTM), a significantly smaller model based on the lightweight TSMixer architecture. TTM marks the first success in developing tiny pretrained models ($\le$1 million parameters), exclusively trained on public TS data with effective transfer learning capabilities. To tackle the complexity of pretraining on multiple datasets with varied temporal resolutions, we introduce several novel enhancements such as adaptive patching, dataset augmentation via downsampling, and resolution prefix tuning. Moreover, we employ a multi-level modeling strategy to effectively model channel correlations and incorporate exogenous signals during finetuning, a crucial capability lacking in existing benchmarks. TTM excels in few/zero-shot forecasting, demonstrating significant accuracy gains (12-38%) over existing benchmarks. Further, it achieves a remarkable 14-106X reduction in model parameters, enabling 54-65X faster training/inference as compared to the LLM-TS benchmarks. In fact, TTM's zero-shot results often surpass the few-shot results in many benchmarks, highlighting the efficacy of our approach. Code and Pretrained Models will be open-sourced.
翻译:大型预训练模型在零样本/小样本学习中在语言和视觉领域表现出色,但因多变量时间序列的多样性和公开预训练数据的稀缺性而面临挑战。为此,近期涌现出利用预训练大型语言模型(LLM)适配时间序列预测的研究,通过跨域迁移学习取得显著成果。然而,这些模型通常规模庞大(约十亿参数)、执行缓慢且未考虑跨通道相关性。针对上述问题,我们提出基于轻量级TSMixer架构的微型时间混合器(TTM)——首个参数不超过100万的微型预训练模型,仅使用公开时间序列数据训练并具备有效迁移学习能力。为应对多数据集不同时间分辨率的预训练复杂性,我们引入自适应分块、下采样数据增强和分辨率前缀调优等创新技术。同时采用多层建模策略有效建模通道相关性,并在微调中整合外生信号——这一关键能力在现有基准中缺失。TTM在零样本/小样本预测中表现卓越,相比现有基准实现12-38%的精度提升;模型参数减少14-106倍,训练/推理速度提升54-65倍。事实上,TTM的零样本结果在许多基准中超越了小样本结果,凸显了方法的有效性。代码与预训练模型将开源。