Time series foundation models have demonstrated strong performance in zero-shot learning, making them well-suited for predicting rapidly evolving patterns in real-world applications where relevant training data are scarce. However, most of these models rely on the Transformer architecture, which incurs quadratic complexity as input length increases. To address this, we introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture. The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy. To reduce reliance on large datasets and lower training costs, TSMamba employs a two-stage transfer learning process that leverages pretrained Mamba LLMs, allowing effective time series modeling with a moderate training set. In the first stage, the forward and backward backbones are optimized via patch-wise autoregressive prediction; in the second stage, the model trains a prediction head and refines other components for long-term forecasting. While the backbone assumes channel independence to manage varying channel numbers across datasets, a channel-wise compressed attention module is introduced to capture cross-channel dependencies during fine-tuning on specific multivariate datasets. Experiments show that TSMamba's zero-shot performance is comparable to state-of-the-art time series foundation models, despite using significantly less training data. It also achieves competitive or superior full-shot performance compared to task-specific prediction models. The code will be made publicly available.
翻译:时间序列基础模型在零样本学习中展现出卓越性能,使其特别适用于预测现实应用中快速演变的模式,尤其是在相关训练数据稀缺的场景下。然而,这些模型大多基于Transformer架构,其计算复杂度随输入长度增加呈二次方增长。为解决此问题,我们提出了TSMamba,一种基于Mamba架构构建的线性复杂度时间序列预测基础模型。该模型通过前向与后向Mamba编码器捕获时间依赖关系,实现了较高的预测精度。为降低对大规模数据集的依赖并减少训练成本,TSMamba采用两阶段迁移学习策略,利用预训练的Mamba大语言模型,仅需中等规模训练集即可实现有效的时间序列建模。第一阶段通过分块自回归预测优化前向与后向主干网络;第二阶段训练预测头并微调其他组件以进行长期预测。虽然主干网络采用通道独立性假设以适配不同数据集的通道数变化,但在针对特定多元数据集的微调过程中,引入了通道级压缩注意力模块以捕捉跨通道依赖关系。实验表明,TSMamba的零样本性能与最先进的时间序列基础模型相当,且训练数据使用量显著减少。与任务专用预测模型相比,其全样本性能亦达到竞争水平或更优。代码将公开提供。