This paper proposes a new metric to measure the calibration error of probabilistic binary classifiers, called test-based calibration error (TCE). TCE incorporates a novel loss function based on a statistical test to examine the extent to which model predictions differ from probabilities estimated from data. It offers (i) a clear interpretation, (ii) a consistent scale that is unaffected by class imbalance, and (iii) an enhanced visual representation with repect to the standard reliability diagram. In addition, we introduce an optimality criterion for the binning procedure of calibration error metrics based on a minimal estimation error of the empirical probabilities. We provide a novel computational algorithm for optimal bins under bin-size constraints. We demonstrate properties of TCE through a range of experiments, including multiple real-world imbalanced datasets and ImageNet 1000.
翻译:本文提出了一种新的度量方法,用于评估概率二分类器的校准误差,称为基于检验的校准误差(TCE)。TCE引入了一种基于统计检验的新型损失函数,以考察模型预测与从数据中估计所得概率之间的差异程度。该方法具有以下特点:(i)清晰的解释性;(ii)不受类别不平衡影响的稳定尺度;(iii)相较于标准可靠性图更为增强的可视化表示。此外,我们提出了基于经验概率最小估计误差的校准误差度量分箱过程的最优性准则,并提供了一种在箱大小约束下计算最优箱的新型算法。通过一系列实验(包括多个真实世界不平衡数据集与ImageNet 1000),我们验证了TCE的各项特性。