How to fix a broken confidence estimator: Evaluating post-hoc methods for selective classification with deep neural networks

This paper addresses the problem of selective classification for deep neural networks, where a model is allowed to abstain from low-confidence predictions to avoid potential errors. We focus on so-called post-hoc methods, which replace the confidence estimator of a given classifier without retraining or modifying it, thus being practically appealing. Considering neural networks with softmax outputs, our goal is to identify the best confidence estimator that can be computed directly from the unnormalized logits. This problem is motivated by the intriguing observation in recent work that many classifiers appear to have a "broken" confidence estimator, in the sense that their selective classification performance is much worse than what could be expected by their corresponding accuracies. We perform an extensive experimental study of many existing and proposed confidence estimators applied to 84 pretrained ImageNet classifiers available from popular repositories. Our results show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance, completely fixing the pathological behavior observed in many classifiers. As a consequence, the selective classification performance of any classifier becomes almost entirely determined by its corresponding accuracy. Moreover, these results are shown to be consistent under distribution shift. We also investigate why certain classifiers innately have a good confidence estimator that apparently cannot be improved by post-hoc methods.

翻译：本文针对深度神经网络的选择性分类问题展开研究，该问题允许模型在低置信度预测中主动弃权以避免潜在错误。我们聚焦于所谓的后处理方法——无需重新训练或修改即可替换给定分类器置信度估计器的技术，这类方法具有显著的实用价值。以采用softmax输出的神经网络为研究对象，我们的目标是从非归一化logits中直接计算出最优置信度估计器。该研究动机源于近期工作中发现的耐人寻味的现象：许多分类器的置信度估计器存在"缺陷"，其选择性分类性能远低于相应准确率所应达到的水平。我们对84个来自主流仓库的预训练ImageNet分类器开展了大规模实验研究，系统评估了现有及新提出的多种置信度估计器。结果表明，对logits进行简单的$p$-范数归一化后，选取最大logit值作为置信度估计器，能显著提升选择性分类性能，彻底消除多数分类器中观察到的病态行为。由此，任意分类器的选择性分类性能几乎完全由其对应准确率决定。此外，这些结果在分布偏移条件下仍保持一致性。我们还探究了部分分类器为何天生具备看似无法通过后处理方法改进的优质置信度估计器。