Uncertainty quantification is a central challenge in reliable and trustworthy machine learning. Naive measures such as last-layer scores are well-known to yield overconfident estimates in the context of overparametrized neural networks. Several methods, ranging from temperature scaling to different Bayesian treatments of neural networks, have been proposed to mitigate overconfidence, most often supported by the numerical observation that they yield better calibrated uncertainty measures. In this work, we provide a sharp comparison between popular uncertainty measures for binary classification in a mathematically tractable model for overparametrized neural networks: the random features model. We discuss a trade-off between classification accuracy and calibration, unveiling a double descent like behavior in the calibration curve of optimally regularized estimators as a function of overparametrization. This is in contrast with the empirical Bayes method, which we show to be well calibrated in our setting despite the higher generalization error and overparametrization.
翻译:不确定性量化是实现可靠可信机器学习中的核心挑战。众所周知,诸如最后一层置信度得分这类朴素度量方法,在过参数化神经网络中会产生过度自信的估计。为了缓解过度自信问题,从温度缩放(temperature scaling)到各种贝叶斯处理方法,已有多种方法被提出,这些方法通常得到数值观测支持,即它们能产生校准效果更佳的不确定性度量。本研究在过参数化神经网络的数学可处理模型——随机特征模型(random features model)中,对二分类问题中几种常见不确定性度量进行了清晰的比较。我们探讨了分类准确度与校准度之间的权衡,揭示了在最优正则化估计量的校准曲线中,随过参数化程度变化而呈现的一种类似双重下降(double descent)的行为。这与经验贝叶斯方法形成对比:尽管存在更高的泛化误差和过参数化,该方法在本研究设定下仍表现出良好的校准性能。