In many classification applications, the prediction of a deep neural network (DNN) based classifier needs to be accompanied by some confidence indication. Two popular approaches for that aim are: 1) Calibration: modifies the classifier's softmax values such that the maximal value better estimates the correctness probability; and 2) Conformal Prediction (CP): produces a prediction set of candidate labels that contains the true label with a user-specified probability, guaranteeing marginal coverage, rather than, e.g., per class coverage. In practice, both types of indications are desirable, yet, so far the interplay between them has not been investigated. We start this paper with an extensive empirical study of the effect of the popular Temperature Scaling (TS) calibration on prominent CP methods and reveal that while it improves the class-conditional coverage of adaptive CP methods, surprisingly, it negatively affects their prediction set sizes. Subsequently, we explore the effect of TS beyond its calibration application and offer simple guidelines for practitioners to trade prediction set size and conditional coverage of adaptive CP methods while effectively combining them with calibration. Finally, we present a theoretical analysis of the effect of TS on the prediction set sizes, revealing several mathematical properties of the procedure, according to which we provide reasoning for this unintuitive phenomenon.
翻译:在许多分类应用中,基于深度神经网络(DNN)的分类器预测需要附带某种置信度指示。为此,两种主流方法是:1)校准:修改分类器的softmax值,使其最大值能更好地估计正确概率;2)保形预测(CP):生成一个候选标签的预测集合,该集合以用户指定的概率包含真实标签,保证边际覆盖而非(例如)每类覆盖。实践中,这两种指示均属需要,然而迄今为止,二者之间的相互作用尚未得到研究。本文首先通过大量实证研究,探讨了常用的温度缩放(TS)校准对主流CP方法的影响,发现虽然它能提升自适应CP方法的类条件覆盖度,但出人意料的是,它会对预测集合的大小产生负面影响。随后,我们探索了TS超越其校准应用之外的效应,并为实践者提供了简单指导,以权衡自适应CP方法的预测集合大小与条件覆盖度,同时有效地将其与校准相结合。最后,我们对TS对预测集合大小的影响进行了理论分析,揭示了该过程的若干数学性质,并据此对这一反直觉现象提供了理论解释。