Learning time series foundation models has been shown to be a promising approach for zero-shot time series forecasting across diverse time series domains. Insofar as scaling has been a critical driver of performance of foundation models in other modalities such as language and vision, much recent work on time series foundation modeling has focused on scaling. This has resulted in time series foundation models with hundreds of millions of parameters that are, while performant, inefficient and expensive to use in practice. This paper describes a simple recipe for learning efficient foundation models for zero-shot time series forecasting that are orders of magnitude smaller. We show that large-scale transformers are not necessary: small hybrid models that interleave long convolution and linear RNN layers (in particular DeltaNet layers) can match the performance of larger transformer-based models while being more than a hundred times smaller. We also describe several data augmentation and inference strategies that further improve performance. This recipe results in Reverso, a family of efficient time series foundation models for zero-shot forecasting that significantly push the performance-efficiency Pareto frontier.
翻译:学习时间序列基础模型已被证明是实现跨领域时间序列零样本预测的一种有前景的方法。鉴于模型规模扩展一直是语言和视觉等其他模态基础模型性能提升的关键驱动力,近期许多时间序列基础模型的研究也聚焦于规模扩展。这导致了参数规模达数亿的时间序列基础模型的出现,这些模型虽然性能优异,但在实际使用中效率低下且成本高昂。本文提出了一种构建高效零样本时间序列预测基础模型的简洁方案,该方案所得模型的规模可缩小数个数量级。我们证明大规模Transformer并非必需:通过交错使用长卷积层与线性RNN层(特别是DeltaNet层)构建的小型混合模型,在性能上可媲美基于Transformer的大型模型,同时参数量减少百倍以上。我们还描述了多种数据增强与推理策略以进一步提升性能。基于此方案,我们提出了Reverso系列模型——一个显著推进性能-效率帕累托前沿的高效零样本时间序列预测基础模型家族。