Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly predicted class (e.g., a test sample is wrongly classified and its softmax score on the ground truth class is around 0.4), which is undesirable. In this paper, we propose a new post-hoc calibration objective derived from the aim of calibration. Intuitively, the proposed objective function asks that the calibrator decrease model confidence on wrongly predicted samples and increase confidence on correctly predicted samples. Because a sample itself has insufficient ability to indicate correctness, we use its transformed versions (e.g., rotated, greyscaled and color-jittered) during calibrator training. Trained on an in-distribution validation set and tested with isolated, individual test samples, our method achieves competitive calibration performance on both in-distribution and out-of-distribution test sets compared with the state of the art. Further, our analysis points out the difference between our method and commonly used objectives such as CE loss and mean square error loss, where the latters sometimes deviates from the calibration aim.
翻译:模型校准旨在使置信度与预测正确性对齐。交叉熵损失广泛用于校准器训练,其强制模型提升对真实类别的置信度。然而,我们发现交叉熵损失存在固有局限性。例如,对于边缘性误分类,经交叉熵损失训练的校准器常对错误预测类别产生高置信度(如测试样本被错误分类且其在真实类别上的softmax得分约为0.4),这并不理想。本文提出一种源自校准目标的后处理校准新目标函数。直观而言,该目标函数要求校准器降低模型对错误预测样本的置信度,并提升对正确预测样本的置信度。由于样本本身难以充分指示正确性,我们在校准器训练中采用其变换版本(如旋转、灰度化与色彩抖动)。在分布内验证集上训练,并对孤立独立测试样本进行测试后,我们的方法在分布内与分布外测试集上均取得与当前最优方法相当的校准性能。进一步分析表明,本方法与交叉熵损失、均方误差损失等常用目标函数存在差异,后者有时会偏离校准目标。