Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event as the forecast origin changes. We introduce the forecast accuracy and coherence score (forecast AC score for short) for measuring the quality of probabilistic multi-horizon forecasts in a way that accounts for both multi-horizon accuracy and stability. Our score additionally allows user-specified weights to balance accuracy and consistency requirements. As an example application, we implement the score as a differentiable objective function for training seasonal auto-regressive integrated models and evaluate it on the M4 Hourly benchmark dataset. Results demonstrate substantial improvements over traditional maximum likelihood estimation. Regarding stability, the AC-optimized model generated out-of-sample forecasts with 91.1\% reduced vertical variance relative to the MLE-fitted model. In terms of accuracy, the AC-optimized model achieved considerable improvements for medium-to-long-horizon forecasts. While one-step-ahead forecasts exhibited a 7.5\% increase in MAPE, all subsequent horizons experienced an improved accuracy as measured by MAPE of up to 26\%. These results indicate that our metric successfully trains models to produce more stable and accurate multi-step forecasts in exchange for some degradation in one-step-ahead performance.
翻译:传统时间序列预测方法仅针对精度进行优化。这一目标忽略了时间一致性,即模型在预测起点变化时对同一未来事件预测结果的一致性程度。我们提出了预测精度与一致性评分(简称预测AC评分),用于衡量概率性多步预测的质量,同时兼顾多步预测精度与稳定性。该评分还允许用户通过指定权重来平衡精度与一致性需求。作为应用示例,我们将该评分实现为可微分目标函数,用于训练季节性自回归积分模型,并在M4小时级基准数据集上进行评估。结果表明,相较于传统最大似然估计方法,该方法取得了显著改进。在稳定性方面,AC优化模型生成的样本外预测的纵向方差较MLE拟合模型降低了91.1%。在精度方面,AC优化模型在中长期预测中实现了显著提升。虽然一步超前预测的MAPE增加了7.5%,但所有后续预测步长的精度均得到改善,MAPE提升幅度最高达26%。这些结果表明,我们的评估指标能够成功训练模型以生成更稳定、更准确的多步预测,其代价仅是单步预测性能的适度下降。