Neural network-based decisions tend to be overconfident, where their raw outcome probabilities do not align with the true decision probabilities. Calibration of neural networks is an essential step towards more reliable deep learning frameworks. Prior metrics of calibration error primarily utilize crisp bin membership-based measures. This exacerbates skew in model probabilities and portrays an incomplete picture of calibration error. In this work, we propose a Fuzzy Calibration Error metric (FCE) that utilizes a fuzzy binning approach to calculate calibration error. This approach alleviates the impact of probability skew and provides a tighter estimate while measuring calibration error. We compare our metric with ECE across different data populations and class memberships. Our results show that FCE offers better calibration error estimation, especially in multi-class settings, alleviating the effects of skew in model confidence scores on calibration error estimation. We make our code and supplementary materials available at: \href{https://github.com/bihani-g/fce}{https://github.com/bihani-g/fce}
翻译:基于神经网络做出的决策往往过于自信,其原始输出概率与真实决策概率并不一致。神经网络的校准是迈向更可靠深度学习框架的关键步骤。先前的校准误差度量主要依赖于基于清晰分箱隶属度的方法,这加剧了模型概率分布的偏斜,并呈现出不完整的校准误差图景。在本工作中,我们提出了一种模糊校准误差度量(FCE),它利用模糊分箱方法来计算校准误差。该方法缓解了概率偏斜的影响,并在测量校准误差时提供了更紧凑的估计。我们在不同数据总体和类别隶属度下将所提度量与ECE进行了比较。结果表明,FCE能提供更好的校准误差估计,尤其在多类别设置中,有效缓解了模型置信度分数的偏斜对校准误差估计的影响。我们的代码和补充材料可在以下网址获取:\href{https://github.com/bihani-g/fce}{https://github.com/bihani-g/fce}