Forecast probabilities often serve as critical inputs for binary decision making. In such settings, calibration$\unicode{x2014}$ensuring forecasted probabilities match empirical frequencies$\unicode{x2014}$is essential. Although the common notion of Expected Calibration Error (ECE) provides actionable insights for decision making, it is not testable: it cannot be empirically estimated in many practical cases. Conversely, the recently proposed Distance from Calibration (dCE) is testable but is not actionable since it lacks decision-theoretic guarantees needed for high-stakes applications. We introduce Cutoff Calibration Error, a calibration measure that bridges this gap by assessing calibration over intervals of forecasted probabilities. We show that Cutoff Calibration Error is both testable and actionable and examine its implications for popular post-hoc calibration methods, such as isotonic regression and Platt scaling.
翻译:预测概率常作为二元决策的关键输入。在此类场景中,校准——即确保预测概率与经验频率相匹配——至关重要。尽管常用的期望校准误差(ECE)概念能为决策提供可操作的洞见,但其不具备可检验性:在许多实际场景中无法通过经验方法进行估计。相反,近期提出的校准距离(dCE)虽具有可检验性,却因缺乏高风险应用所需的决策理论保证而不具备可操作性。本文提出截断校准误差这一校准度量方法,通过评估预测概率区间内的校准表现来弥合上述鸿沟。我们证明截断校准误差同时具备可检验性与可操作性,并探讨其对等渗回归、普拉特缩放等常用后处理校准方法的影响。