The Expected Normalized Calibration Error (ENCE) is a popular calibration statistic used in Machine Learning to assess the quality of prediction uncertainties for regression problems. Estimation of the ENCE is based on the binning of calibration data. In this short note, I illustrate an annoying property of the ENCE, i.e. its proportionality to the square root of the number of bins for well calibrated or nearly calibrated datasets. A similar behavior affects the calibration error based on the variance of z-scores (ZVE), and in both cases this property is a consequence of the use of a Mean Absolute Deviation (MAD) statistic to estimate calibration errors. Hence, the question arises of which number of bins to choose for a reliable estimation of calibration error statistics. A solution is proposed to infer ENCE and ZVE values that do not depend on the number of bins for datasets assumed to be calibrated, providing simultaneously a statistical calibration test. It is also shown that the ZVE is less sensitive than the ENCE to outstanding errors or uncertainties.
翻译:期望归一化校准误差(ENCE)是机器学习中用于评估回归问题预测不确定性质量的常用校准统计量。ENCE的估计基于校准数据的分箱处理。在这篇短文中,我阐述了ENCE的一个令人困扰的特性,即对于校准良好或接近校准的数据集,其值与分箱数的平方根成正比。类似的行为也影响基于z分数方差(ZVE)的校准误差,且在这两种情形中,该特性源于使用平均绝对偏差(MAD)统计量来估计校准误差。因此,如何选择合适的分箱数以可靠估计校准误差统计量的问题随之产生。本文提出了一种解决方案,可在假定数据集已校准的情况下推断出与分箱数无关的ENCE和ZVE值,同时提供统计校准检验。此外,研究表明ZVE对异常误差或不确定性的敏感度低于ENCE。