We introduce inference methods for score decompositions, which partition scoring functions for predictive assessment into three interpretable components: miscalibration, discrimination, and uncertainty. Our estimation and inference relies on a linear recalibration of the forecasts, which is applicable to general multi-step ahead point forecasts such as means and quantiles due to its validity for both smooth and non-smooth scoring functions. This approach ensures desirable finite-sample properties, enables asymptotic inference, and establishes a direct connection to the classical Mincer-Zarnowitz regression. The resulting inference framework facilitates tests for equal forecast calibration or discrimination, which yield three key advantages. They enhance the information content of predictive ability tests by decomposing scores, deliver higher statistical power in certain scenarios, and formally connect scoring-function-based evaluation to traditional calibration tests, such as financial backtests. Applications demonstrate the method's utility. We find that for survey inflation forecasts, discrimination abilities can differ significantly even when overall predictive ability does not. In an application to financial risk models, our tests provide deeper insights into the calibration and information content of volatility and Value-at-Risk forecasts. By disentangling forecast accuracy from backtest performance, the method exposes critical shortcomings in current banking regulation.
翻译:本文提出了评分分解的推断方法,该方法将预测评估的评分函数划分为三个可解释的组成部分:校准误差、区分度和不确定性。我们的估计与推断依赖于预测的线性再校准,该方法适用于一般多步超前点预测(如均值和分位数),因其对平滑和非平滑评分函数均有效。此方法确保了理想的有限样本性质,支持渐近推断,并与经典的Mincer-Zarnowitz回归建立了直接联系。由此产生的推断框架促进了预测校准或区分度相等性检验,并带来三个关键优势:通过分解评分增强了预测能力检验的信息含量,在特定场景下提供更高的统计功效,并将基于评分函数的评估与传统校准检验(如金融回测)正式联系起来。应用实例展示了该方法的实用性。我们发现,对于调查通胀预测,即使整体预测能力相同,区分度也可能存在显著差异。在金融风险模型的应用中,我们的检验为波动率和风险价值预测的校准与信息含量提供了更深入的洞察。通过将预测准确性与回测表现分离,该方法揭示了当前银行监管中的关键缺陷。