The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that calibration can be critical for many practical applications. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform a series of systematic evaluations assessing model calibration (i.e., over- or under-confidence), effects of varying prediction heads, and calibration under long-term autoregressive forecasting. We find that time series foundation models are consistently better calibrated than baseline models and tend not to be either systematically over- or under-confident, in contrast to the overconfidence often seen in other deep learning models.
翻译:近期时间序列数据基础模型的发展引发了在多种应用中使用此类模型的广泛兴趣。尽管基础模型实现了最先进的预测性能,但其校准特性仍相对缺乏探索,尽管校准对许多实际应用至关重要。本文研究了五种近期时间序列基础模型和两种竞争基线的校准相关特性。我们进行了一系列系统评估,考察模型校准(即过度自信或自信不足)、不同预测头的影响以及长期自回归预测下的校准表现。研究发现,时间序列基础模型的校准性能始终优于基线模型,且未表现出系统性过度自信或自信不足,这与其它深度学习模型中常见的过度自信现象形成鲜明对比。