The recent surge in Time Series Foundation Models has rapidly advanced the field, yet the heterogeneous training setups across studies make it difficult to attribute improvements to architectural innovations versus data engineering. In this work, we investigate the potential of a standard patch Transformer, demonstrating that this generic architecture achieves state-of-the-art zero-shot forecasting performance using a straightforward training protocol. We conduct a comprehensive ablation study that covers model scaling, data composition, and training techniques to isolate the essential ingredients for high performance. Our findings identify the key drivers of performance, while confirming that the generic architecture itself demonstrates excellent scalability. By strictly controlling these variables, we provide comprehensive empirical results on model scaling across multiple dimensions. We release our open-source model and detailed findings to establish a transparent, reproducible baseline for future research.
翻译:近期时间序列基础模型的快速发展极大地推动了该领域的进步,然而不同研究间异构的训练设置使得难以将性能提升归因于架构创新还是数据工程。在本工作中,我们研究了标准分块Transformer的潜力,证明这种通用架构通过简单的训练协议即可实现最先进的零样本预测性能。我们进行了涵盖模型缩放、数据构成和训练技术的全面消融研究,以分离实现高性能的关键要素。我们的研究结果明确了性能提升的核心驱动因素,同时证实通用架构本身具有出色的可扩展性。通过严格控制这些变量,我们在多维度上提供了关于模型缩放的全面实证结果。我们开源了模型并公布了详细发现,旨在为未来研究建立透明、可复现的基线。