Forecast probabilities often serve as critical inputs for binary decision making. In such settings, calibration$\unicode{x2014}$ensuring forecasted probabilities match empirical frequencies$\unicode{x2014}$is essential. Although the common notion of Expected Calibration Error (ECE) provides actionable insights for decision making, it is not testable: it cannot be empirically estimated in many practical cases. Conversely, the recently proposed Distance from Calibration (dCE) is testable, but it is not actionable since it lacks decision-theoretic guarantees needed for high-stakes applications. To resolve this question, we consider Cutoff Calibration Error, a calibration measure that bridges this gap by assessing calibration over intervals of forecasted probabilities. We show that Cutoff Calibration Error is both testable and actionable, and we examine its implications for popular post-hoc calibration methods, such as isotonic regression and Platt scaling.
翻译:预测概率常作为二元决策的关键输入。在此类场景中,校准——即确保预测概率与经验频率相匹配——至关重要。尽管常用的期望校准误差(ECE)概念能为决策提供可操作的洞见,但其不具备可检验性:在许多实际情况下无法通过经验进行估计。相反,近期提出的校准距离(dCE)虽具有可检验性,却缺乏高风险应用所需的决策理论保障,因而不可操作。为解答这一问题,我们提出截断校准误差——一种通过评估预测概率区间内的校准程度来弥合上述差距的校准度量。我们证明截断校准误差同时具备可检验性与可操作性,并探讨了其对等渗回归、Platt缩放等常用后处理校准方法的影响。