FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting

Driven by the transition towards a climate-neutral energy system, accurate energy time series forecasting is critical for planning and operation. Yet, it remains largely a dataset-specific task, requiring comprehensive training data, limiting scalability, and resulting in high model development and maintenance effort. Recently, foundation models that aim to learn generalizable patterns via extensive pretraining have shown superior performance in multiple prediction tasks. Despite their success and strong potential to address challenges in energy forecasting, their application in this domain remains largely unexplored. We address this gap by presenting the Foundation Models in Energy Time Series Forecasting (FETS) benchmark. We (1) provide a structured overview of energy forecasting use cases along three main dimensions: stakeholders, attributes, and data categories; (2) collect and analyze 54 datasets across 9 data categories, guided by typical stakeholder interests; (3) benchmark foundation models against classical machine learning approaches across different forecasting settings. Foundation models consistently outperform dataset-specific optimized machine learning approaches across all settings and data categories, despite the latter having seen the full historic target data during training. In particular, covariate-informed foundation models achieve the strongest performance. Further analysis reveals a strong correlation between predictive performance and spectral entropy, performance saturation beyond a certain context length, and improved performance at higher aggregation levels such as national load, district heating, and power grid data. Overall, our findings highlight the strong potential of foundation models as scalable and generalizable forecasting solutions for the energy domain, particularly in data-constrained and privacy-sensitive settings.

翻译：在向气候中和能源系统转型的驱动下，准确的能源时间序列预测对规划与运行至关重要。然而，该任务目前仍主要依赖数据集特定方法，需要全面的训练数据，限制了可扩展性，并导致模型开发与维护成本高昂。近期，旨在通过大规模预训练学习通用模式的基础模型已在多项预测任务中展现出优越性能。尽管此类模型在能源预测领域具有巨大潜力且已取得显著成功，但其在该领域的应用仍鲜有探索。为填补这一空白，我们提出了能源时间序列预测基础模型（FETS）基准测试。我们：（1）沿利益相关者、属性与数据类别三个主要维度，系统梳理了能源预测用例的结构化概览；（2）基于典型利益相关者需求，收集并分析了涵盖9个数据类别的54个数据集；（3）在不同预测设置下，将基础模型与经典机器学习方法进行对比基准测试。结果表明，尽管数据集特定优化方法在训练过程中已观测到完整历史目标数据，但基础模型在所有设置与数据类别中均持续优于这些方法。其中，协变量感知的基础模型取得了最强性能。进一步分析揭示了预测性能与频谱熵之间的强相关性、超出特定上下文长度后的性能饱和现象，以及在更高聚合层级（如国家负荷、区域供热、电网数据）上的性能提升。总体而言，我们的研究凸显了基础模型作为能源领域可扩展且可泛化的预测解决方案的巨大潜力，尤其适用于数据受限和隐私敏感场景。