Time series forecasting is a fundamental problem with applications in climate, energy, healthcare, and finance. Many existing approaches require domain-specific feature engineering and substantial labeled data for each task. We introduce PatchFormer, a patch-based time series foundation model that uses hierarchical masked reconstruction for self-supervised pretraining and lightweight adapters for efficient transfer. PatchFormer segments time series into patches and learns multiscale temporal representations with learnable aggregation across temporal scales. Pretraining uses masked patch reconstruction with dynamic masking and objectives that encourage both local accuracy and global consistency, followed by cross-domain knowledge distillation. Experiments on 24 benchmark datasets spanning weather, energy, traffic, finance, and healthcare demonstrate state-of-the-art zero-shot multi-horizon forecasting, reducing mean squared error by 27.3 percent relative to strong baselines while requiring 94 percent less task-specific training data. The model exhibits near log-linear scaling with more pretraining data up to 100 billion points and processes length-512 sequences 3.8x faster than full-sequence transformers.
翻译:时间序列预测是气候、能源、医疗和金融等领域的基础性问题。现有方法大多需要针对特定领域进行特征工程,且每个任务都需要大量标注数据。本文提出PatchFormer,一种基于片段的时间序列基础模型,通过分层掩码重建进行自监督预训练,并采用轻量级适配器实现高效迁移。该模型将时间序列分割为片段,通过可学习的跨时间尺度聚合机制学习多尺度时序表征。预训练阶段采用动态掩码的片段重建任务,其优化目标同时促进局部精度与全局一致性,并辅以跨领域知识蒸馏。在涵盖气象、能源、交通、金融和医疗领域的24个基准数据集上的实验表明,本模型在零样本多步预测任务上达到最先进性能,相较于强基线方法平均平方误差降低27.3%,同时所需任务特定训练数据减少94%。该模型在千亿级数据规模下呈现近似对数线性的扩展特性,且处理长度为512的序列时比全序列Transformer快3.8倍。