Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores, hindering the applications in critical systems. In this paper, we propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance. We observe that, with the cross-entropy loss, model predictions are optimized to align with the corresponding labels via increasing logit magnitude or refining logit direction. However, regarding atypical samples, the image content and their labels may exhibit disparities. This discrepancy can lead to overfitting on atypical samples, ultimately resulting in the overconfidence issue that we aim to address. To tackle the problem, we have devised a metric that quantifies the typicalness of each sample, enabling the dynamic adjustment of the logit magnitude during the training process. By allowing atypical samples to be adequately fitted while preserving reliable logit direction, the problem of overconfidence can be mitigated. TAL has been extensively evaluated on benchmark datasets, and the results demonstrate its superiority over existing failure detection methods. Specifically, TAL achieves a more than 5% improvement on CIFAR100 in terms of the Area Under the Risk-Coverage Curve (AURC) compared to the state-of-the-art. Code is available at https://github.com/liuyijungoon/TAL.
翻译:深度神经网络(DNNs)常受过度自信问题困扰,即模型以高置信度分数做出错误预测,这阻碍了其在关键系统中的应用。本文提出一种名为典型性感知学习(TAL)的新方法,以解决此问题并提升故障检测性能。我们观察到,在使用交叉熵损失时,模型预测通过增大逻辑值幅度或优化逻辑值方向来与相应标签对齐。然而,对于非典型样本,其图像内容与标签之间可能存在差异。这种差异可能导致模型在非典型样本上过拟合,最终产生我们旨在解决的过度自信问题。为解决此问题,我们设计了一种量化每个样本典型性的度量指标,从而能够在训练过程中动态调整逻辑值幅度。通过允许非典型样本得到充分拟合,同时保持可靠逻辑值方向,可以缓解过度自信问题。TAL在基准数据集上进行了广泛评估,结果表明其优于现有的故障检测方法。具体而言,在CIFAR100数据集上,就风险-覆盖率曲线下面积(AURC)而言,TAL相比最先进方法实现了超过5%的性能提升。代码发布于 https://github.com/liuyijungoon/TAL。