With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibration errors, which relates every calibration error to a proper score and provides a respective upper bound with optimal estimation properties. This relationship can be used to reliably quantify the model calibration improvement. We theoretically and empirically demonstrate the shortcomings of commonly used estimators compared to our approach. Due to the wide applicability of proper scores, this gives a natural extension of recalibration beyond classification.
翻译:随着模型可信度在敏感的现实世界应用中至关重要,实践者越来越关注提升深度神经网络的不确定性校准效果。校准误差旨在量化概率预测的可靠性,但其估计值通常存在偏差且不一致。本文引入恰当校准误差框架,该框架将每种校准误差与恰当评分相关联,并给出具有最优估计特性的相应上界。这一关联可用于可靠地量化模型校准改进效果。我们从理论与实证角度证明了常用估计器相较我们方法的局限性。由于恰当评分具有广泛适用性,这为超越分类任务的重校准提供了自然扩展。