Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one source of uncertainty. This paper presents the first benchmark of uncertainty disentanglement. We reimplement and evaluate a comprehensive range of uncertainty estimators, from Bayesian over evidential to deterministic ones, across a diverse range of uncertainty tasks on ImageNet. We find that, despite recent theoretical endeavors, no existing approach provides pairs of disentangled uncertainty estimators in practice. We further find that specialized uncertainty tasks are harder than predictive uncertainty tasks, where we observe saturating performance. Our results provide both practical advice for which uncertainty estimators to use for which specific task, and reveal opportunities for future research toward task-centric and disentangled uncertainties. All our reimplementations and Weights & Biases logs are available at https://github.com/bmucsanyi/untangle.
翻译:不确定性量化曾是一项单一任务,现已演变为包含弃权预测、分布外检测及偶然不确定性量化在内的任务谱系。最新研究目标是解耦:构建多个估计器,每个估计器专门针对且仅针对单一不确定性来源。本文提出了首个不确定性解耦基准测试。我们在ImageNet数据集上,针对多样化不确定性任务,重新实现并评估了涵盖贝叶斯方法、证据理论方法到确定性方法的全面不确定性估计器。研究发现,尽管近期理论探索不断,实践中尚未存在能够提供解耦不确定性估计器对的方法。我们进一步发现,专用不确定性任务比预测性不确定性任务更具挑战性,后者已呈现性能饱和态势。研究结果既为特定任务应选用何种不确定性估计器提供了实践指导,也为未来面向任务中心化与解耦不确定性的研究揭示了方向。所有重新实现的代码及Weights & Biases日志均公开于https://github.com/bmucsanyi/untangle。