Confidence calibration is central to providing accurate and interpretable uncertainty estimates, especially under safety-critical scenarios. However, we find that existing calibration algorithms often overlook the issue of proximity bias, a phenomenon where models tend to be more overconfident in low proximity data (i.e., lying in the sparse region of the data distribution) compared to high proximity samples, and thus suffer from inconsistent miscalibration across different proximity samples. We examine the problem over pretrained ImageNet models and observe that: 1) Proximity bias exists across a wide variety of model architectures and sizes; 2) Transformer-based models are more susceptible to proximity bias than CNN-based models; 3) Proximity bias persists even after performing popular calibration algorithms like temperature scaling; 4) Models tend to overfit more heavily on low proximity samples than on high proximity samples. Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. To further quantify the effectiveness of calibration algorithms in mitigating proximity bias, we introduce proximity-informed expected calibration error (PIECE) with theoretical analysis. We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings under four metrics over various model architectures.
翻译:置信度校准在提供准确且可解释的不确定性估计中至关重要,尤其是在安全关键场景下。然而,我们发现现有校准算法常忽视邻近偏差问题——即模型在低邻近数据(即位于数据分布稀疏区域的样本)上比高邻近样本更容易表现出过度自信的现象,从而导致不同邻近样本间的校准不一致。我们基于预训练ImageNet模型分析该问题,观察到:1)邻近偏差广泛存在于各类模型架构与规模中;2)基于Transformer的模型比基于CNN的模型更易受到邻近偏差影响;3)即使采用温度缩放等流行校准算法,邻近偏差依然存在;4)模型在低邻近样本上的过拟合程度高于高邻近样本。受实验发现启发,我们提出ProCal——一种即插即用算法,能从理论上保证基于邻近度调整样本置信度。为进一步量化校准算法缓解邻近偏差的效果,我们引入邻近感知期望校准误差(PIECE)并提供理论分析。实验表明,ProCal在平衡、长尾及分布偏移场景下,能有效处理邻近偏差并提升模型架构在四项指标上的校准性能。