While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning. Our analysis establishes upper bounds on the bias, achieving an improved convergence rate. Moreover, our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias. We further extend our bias analysis to generalization error analysis based on the information-theoretic approach, deriving upper bounds that enable the numerical evaluation of how small the ECE is for unknown data. Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach.
翻译:尽管采用分箱策略的期望校准误差(ECE)被广泛用于评估机器学习模型的校准性能,但对其估计偏差的理论理解仍较为有限。本文首次对均匀质量分箱与均匀宽度分箱这两种常用分箱策略的估计偏差进行了系统性分析。我们建立了偏差的上界,并获得了更优的收敛速率。此外,所提出的上界首次揭示了最小化估计偏差的最优分箱数量。我们进一步将偏差分析拓展至基于信息论方法的泛化误差分析,推导出的上界使得在未知数据上定量评估ECE的微小程度成为可能。通过深度学习模型的实验表明,得益于这种信息论泛化分析方法,我们得到的边界具有非平凡的实际意义。