In many classification applications, the prediction of a deep neural network (DNN) based classifier needs to be accompanied with some confidence indication. Two popular post-processing approaches for that aim are: 1) calibration: modifying the classifier's softmax values such that their maximum (associated with the prediction) better estimates the correctness probability; and 2) conformal prediction (CP): devising a score (based on the softmax values) from which a set of predictions with theoretically guaranteed marginal coverage of the correct class is produced. While in practice both types of indications can be desired, so far the interplay between them has not been investigated. Toward filling this gap, in this paper we study the effect of temperature scaling, arguably the most common calibration technique, on prominent CP methods. We start with an extensive empirical study that among other insights shows that, surprisingly, calibration has a detrimental effect on popular adaptive CP methods: it frequently leads to larger prediction sets. Then, we turn to theoretically analyze this behavior. We reveal several mathematical properties of the procedure, according to which we provide a reasoning for the phenomenon. Our study suggests that it may be worthwhile to utilize adaptive CP methods, chosen for their enhanced conditional coverage, based on softmax values prior to (or after canceling) temperature scaling calibration.
翻译:在许多分类应用中,基于深度神经网络(DNN)的分类器进行预测时,需要附带某种置信度指示。为此目的,两种流行的后处理方法分别是:1) 校准:调整分类器的softmax值,使其最大值(与预测对应)能更好地估计正确概率;2) 共形预测(CP):基于softmax值设计一种评分方法,从而生成一个理论上保证正确类别边际覆盖率的预测集。尽管在实践中这两种指示可能同时被需要,但迄今为止它们之间的相互作用尚未被研究。为填补这一空白,本文研究了最常用的校准技术——温度缩放——对主流CP方法的影响。我们首先开展了一项广泛的实证研究,其中一项发现令人惊讶:校准对流行的自适应CP方法产生了有害影响——它常常导致更大的预测集。随后,我们从理论上分析了这一行为。我们揭示了该过程的若干数学性质,并据此为这一现象提供了合理解释。我们的研究表明,在使用自适应CP方法(因其增强的条件覆盖能力而被选用)时,可能更适合基于温度缩放校准之前(或取消温度缩放校准后)的softmax值来执行。