What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers

When deployed for risk-sensitive tasks, deep neural networks must include an uncertainty estimation mechanism. Here we examine the relationship between deep architectures and their respective training regimes, with their corresponding selective prediction and uncertainty estimation performance. We consider some of the most popular estimation performance metrics previously proposed including AUROC, ECE, AURC as well as coverage for selective accuracy constraint. We present a novel and comprehensive study of selective prediction and the uncertainty estimation performance of 523 existing pretrained deep ImageNet classifiers that are available in popular repositories. We identify numerous and previously unknown factors that affect uncertainty estimation and examine the relationships between the different metrics. We find that distillation-based training regimes consistently yield better uncertainty estimations than other training schemes such as vanilla training, pretraining on a larger dataset and adversarial training. Moreover, we find a subset of ViT models that outperform any other models in terms of uncertainty estimation performance. For example, we discovered an unprecedented 99% top-1 selective accuracy on ImageNet at 47% coverage (and 95% top-1 accuracy at 80%) for a ViT model, whereas a competing EfficientNet-V2-XL cannot obtain these accuracy constraints at any level of coverage. Our companion paper, also published in ICLR 2023 (A framework for benchmarking class-out-of-distribution detection and its application to ImageNet), examines the performance of these classifiers in a class-out-of-distribution setting.

翻译：当深度神经网络部署于风险敏感任务时，必须包含不确定性估计机制。本文探讨了深度架构及其相应训练策略与选择性预测及不确定性估计性能之间的关系。我们采用了包括AUROC、ECE、AURC以及选择性精度约束覆盖率在内的多项先前提出的常用评估指标。通过对现有公开资源中523个预训练深度ImageNet分类器进行全面的选择性预测与不确定性估计性能研究，我们发现了大量此前未知的影响不确定性估计的因素，并考察了不同评估指标间的关联性。研究发现，基于知识蒸馏的训练策略在不确定性估计方面始终优于普通训练、大规模数据集预训练及对抗训练等训练方案。此外，我们识别出部分ViT模型在不确定性估计性能上显著超越其他模型。例如，某个ViT模型在47%覆盖率下实现了前所未有的ImageNet Top-1选择性精度99%（80%覆盖率下达到95%），而对比模型EfficientNet-V2-XL在任何覆盖率水平下均无法达到此类精度约束。我们同样发表于ICLR 2023的姊妹论文（面向类外分布检测的基准框架及其在ImageNet上的应用）则考察了这些分类器在类外分布场景下的性能表现。