This short study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration. Considering that variance-based calibration metrics (ZMS, NLL, RCE...) are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP). It is shown on a large ensemble of molecular properties datasets that (1) sets of z-scores are well represented by Student's-$t(\nu)$ distributions, $\nu$ being the number of degrees of freedom; (2) accurate estimation of 95 $\%$ prediction intervals can be obtained by the simple $2\sigma$ rule for $\nu>3$; and (3) the resulting PICPs are more quickly and reliably tested than variance-based calibration metrics. Overall, this method enables to test 20 $\%$ more datasets than ZMS testing. Conditional calibration is also assessed using the PICP approach.
翻译:本研究提出了一种机遇性方法,旨在建立(更为)可靠的预测不确定性平均校准验证方法。考虑到基于方差的校准度量(ZMS、NLL、RCE等)对不确定性和误差分布中重尾现象的存在相当敏感,本文建议转向一种基于区间的度量——预测区间覆盖概率(PICP)。在大量分子性质数据集上的实验表明:(1)z分数集合能很好地用自由度为$\nu$的Student-$t(\nu)$分布描述;(2)当$\nu>3$时,通过简单的$2\sigma$准则即可获得准确的95$\%$预测区间估计;(3)由此得到的PICP比基于方差的校准度量能更快速、可靠地进行检验。总体而言,该方法相比ZMS检验能多测试20$\%$的数据集。研究还采用PICP方法对条件校准进行了评估。