Trustworthy machine learning is of primary importance to the practical deployment of deep learning models. While state-of-the-art models achieve astonishingly good performance in terms of accuracy, recent literature reveals that their predictive confidence scores unfortunately cannot be trusted: e.g., they are often overconfident when wrong predictions are made, or so even for obvious outliers. In this paper, we introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model, thereby improving its trustworthiness. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner. Extensive experiments on three trustworthiness-related tasks (misclassification detection, calibration and out-of-distribution detection) across various benchmarks verify the effectiveness of our proposed probing framework.
翻译:可信机器学习对于深度学习模型的实际部署至关重要。尽管最先进的模型在准确率方面表现惊人,但近期文献揭示其预测置信度分数不可信赖:例如,模型在做出错误预测时往往过度自信,甚至对明显异常值也是如此。本文提出一种全新的自监督探测方法,能够检验并缓解已训练模型的过度自信问题,从而提升其可信度。我们构建了一个简单而有效的框架,能够以“即插即用”的方式灵活应用于现有可信度相关方法。针对三项可信度相关任务(误分类检测、校准及分布外检测)的广泛实验,在多种基准测试中验证了我们提出的探测框架的有效性。