Comparing the quality of neural network uncertainty estimates for classification problems

Traditional deep learning (DL) models are powerful classifiers, but many approaches do not provide uncertainties for their estimates. Uncertainty quantification (UQ) methods for DL models have received increased attention in the literature due to their usefulness in decision making, particularly for high-consequence decisions. However, there has been little research done on how to evaluate the quality of such methods. We use statistical methods of frequentist interval coverage and interval width to evaluate the quality of credible intervals, and expected calibration error to evaluate classification predicted confidence. These metrics are evaluated on Bayesian neural networks (BNN) fit using Markov Chain Monte Carlo (MCMC) and variational inference (VI), bootstrapped neural networks (NN), Deep Ensembles (DE), and Monte Carlo (MC) dropout. We apply these different UQ for DL methods to a hyperspectral image target detection problem and show the inconsistency of the different methods' results and the necessity of a UQ quality metric. To reconcile these differences and choose a UQ method that appropriately quantifies the uncertainty, we create a simulated data set with fully parameterized probability distribution for a two-class classification problem. The gold standard MCMC performs the best overall, and the bootstrapped NN is a close second, requiring the same computational expense as DE. Through this comparison, we demonstrate that, for a given data set, different models can produce uncertainty estimates of markedly different quality. This in turn points to a great need for principled assessment methods of UQ quality in DL applications.

翻译：传统深度学习（DL）模型在分类任务中表现优异，但许多方法无法为其估计结果提供不确定性。深度学习模型的不确定性量化（UQ）方法因其在决策（尤其是高风险决策）中的实用性而受到文献的日益关注。然而，如何评估此类方法质量的研究仍然较少。我们采用频率学派区间覆盖率和区间宽度的统计学方法来评估可信区间的质量，并利用期望校准误差评估分类预测的置信度。这些指标被应用于基于马尔可夫链蒙特卡洛（MCMC）和变分推断（VI）的贝叶斯神经网络（BNN）、自助法神经网络（NN）、深度集成（DE）以及蒙特卡洛（MC）Dropout方法。我们将这些不同的深度学习不确定性量化方法应用于高光谱图像目标检测问题，揭示了不同方法结果的不一致性以及不确定性量化质量指标的必要性。为了调和这些差异并选择能恰当量化不确定性的方法，我们针对二分类问题创建了一个完全参数化概率分布的模拟数据集。黄金标准方法MCMC表现最佳，自助法神经网络紧随其后，其计算开销与深度集成相同。通过这一比较，我们证明对于给定数据集，不同模型可能产生质量差异显著的不确定性估计。这进而表明在深度学习应用中迫切需要建立对不确定性量化质量的原则性评估方法。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日