Rigorous statistical methods, including parameter estimation with accompanying uncertainties, underpin the validity of scientific discovery, especially in the natural sciences. With increasingly complex data models such as deep learning techniques, uncertainty quantification has become exceedingly difficult and a plethora of techniques have been proposed. In this case study, we use the unifying framework of approximate Bayesian inference combined with empirical tests on carefully created synthetic classification datasets to investigate qualitative properties of six different probabilistic machine learning algorithms for class probability and uncertainty estimation: (i) a neural network ensemble, (ii) neural network ensemble with conflictual loss, (iii) evidential deep learning, (iv) a single neural network with Monte Carlo Dropout, (v) Gaussian process classification and (vi) a Dirichlet process mixture model. We check if the algorithms produce uncertainty estimates which reflect commonly desired properties, such as being well calibrated and exhibiting an increase in uncertainty for out-of-distribution data points. Our results indicate that all algorithms show reasonably good calibration performance on our synthetic test sets, but none of the deep learning based algorithms provide uncertainties that consistently reflect lack of experimental evidence for out-of-distribution data points. We hope our study may serve as a clarifying example for researchers that are using or developing methods of uncertainty estimation for scientific data-driven modeling and analysis.
翻译:严谨的统计方法,包括附带不确定性的参数估计,是科学发现有效性的基础,尤其在自然科学领域。随着深度学习技术等数据模型日益复杂,不确定性量化变得极其困难,大量技术方法被提出。在本案例研究中,我们采用近似贝叶斯推断的统一框架,结合对精心构建的合成分类数据集的实证测试,研究了六种不同概率机器学习算法在类别概率和不确定性估计方面的定性特性:(i) 神经网络集成,(ii) 使用冲突损失的神经网络集成,(iii) 证据深度学习,(iv) 采用蒙特卡洛Dropout的单一神经网络,(v) 高斯过程分类,以及(vi) 狄利克雷过程混合模型。我们检验这些算法产生的不确定性估计是否反映通常期望的特性,例如良好的校准性以及对分布外数据点表现出不确定性增加。结果表明,所有算法在合成测试集上均表现出较好的校准性能,但基于深度学习的算法均未能提供能一致反映分布外数据点缺乏实验证据的不确定性估计。我们希望本研究能为使用或开发不确定性估计方法进行科学数据驱动建模与分析的研究人员提供一个清晰的范例。