Proper scoring rules evaluate the quality of probabilistic predictions, playing an essential role in the pursuit of accurate and well-calibrated models. Every proper score decomposes into two fundamental components -- proper calibration error and refinement -- utilizing a Bregman divergence. While uncertainty calibration has gained significant attention, current literature lacks a general estimator for these quantities with known statistical properties. To address this gap, we propose a method that allows consistent, and asymptotically unbiased estimation of all proper calibration errors and refinement terms. In particular, we introduce Kullback--Leibler calibration error, induced by the commonly used cross-entropy loss. As part of our results, we prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks, regardless of which proper scoring rule is optimized. Our experiments validate empirically the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
翻译:适当评分规则用于评估概率预测的质量,在追求准确且校准良好的模型中发挥着关键作用。每个适当评分通过Bregman散度分解为两个基本组成部分——适度校准误差和细化项。尽管不确定性校准已引起广泛关注,但当前文献缺乏对这些量具有已知统计特性的通用估计方法。为弥补这一不足,我们提出一种方法,能够对所有适度校准误差和细化项进行一致且渐近无偏估计。具体而言,我们引入了由常用交叉熵损失导出的Kullback-Leibler校准误差。作为研究结果的一部分,我们证明了细化项与f-散度之间的关系,这暗示了神经网络中信息单调性的存在,无论优化的是何种适当评分规则。实验验证了所提估计方法的理论特性,并表明后验校准方法的选择应取决于所关注的特定校准误差。