The calibration for deep neural networks is currently receiving widespread attention and research. Miscalibration usually leads to overconfidence of the model. While, under the condition of long-tailed distribution of data, the problem of miscalibration is more prominent due to the different confidence levels of samples in minority and majority categories, and it will result in more serious overconfidence. To address this problem, some current research have designed diverse temperature coefficients for different categories based on temperature scaling (TS) method. However, in the case of rare samples in minority classes, the temperature coefficient is not generalizable, and there is a large difference between the temperature coefficients of the training set and the validation set. To solve this challenge, this paper proposes a dual-branch temperature scaling calibration model (Dual-TS), which considers the diversities in temperature parameters of different categories and the non-generalizability of temperature parameters for rare samples in minority classes simultaneously. Moreover, we noticed that the traditional calibration evaluation metric, Excepted Calibration Error (ECE), gives a higher weight to low-confidence samples in the minority classes, which leads to inaccurate evaluation of model calibration. Therefore, we also propose Equal Sample Bin Excepted Calibration Error (Esbin-ECE) as a new calibration evaluation metric. Through experiments, we demonstrate that our model yields state-of-the-art in both traditional ECE and Esbin-ECE metrics.
翻译:深度神经网络的校准问题近年来受到广泛关注和研究。误校准通常会导致模型过度自信。而在数据长尾分布条件下,由于少数类和多数类样本的置信度水平不同,误校准问题更为突出,并会引发更严重的过度自信。为解决这一问题,当前一些研究基于温度缩放方法为不同类别设计了不同的温度系数。然而,在少数类样本稀少的情况下,温度系数泛化性较差,且训练集与验证集的温度系数存在较大差异。针对这一挑战,本文提出了一种双分支温度缩放校准模型,该模型同时考虑了不同类别温度参数的差异性以及少数类稀有样本温度参数的不可泛化性。此外,我们注意到传统校准评估指标期望校准误差对少数类中的低置信度样本赋予了较高权重,导致模型校准评估不准确。因此,我们提出等样本分箱期望校准误差作为新的校准评估指标。通过实验证明,我们的模型在传统ECE和Esbin-ECE两项指标上均达到了最优性能。