On Calibration and Conformal Prediction of Deep Classifiers

In many classification applications, the prediction of a deep neural network (DNN) based classifier needs to be accompanied with some confidence indication. Two popular post-processing approaches for that aim are: 1) calibration: modifying the classifier's softmax values such that their maximum (associated with the prediction) better estimates the correctness probability; and 2) conformal prediction (CP): devising a score (based on the softmax values) from which a set of predictions with theoretically guaranteed marginal coverage of the correct class is produced. While in practice both types of indications can be desired, so far the interplay between them has not been investigated. Toward filling this gap, in this paper we study the effect of temperature scaling, arguably the most common calibration technique, on prominent CP methods. We start with an extensive empirical study that among other insights shows that, surprisingly, calibration has a detrimental effect on popular adaptive CP methods: it frequently leads to larger prediction sets. Then, we turn to theoretically analyze this behavior. We reveal several mathematical properties of the procedure, according to which we provide a reasoning for the phenomenon. Our study suggests that it may be worthwhile to utilize adaptive CP methods, chosen for their enhanced conditional coverage, based on softmax values prior to (or after canceling) temperature scaling calibration.

翻译：在许多分类应用中，基于深度神经网络（DNN）的分类器进行预测时，需要附带某种置信度指示。为此目的，两种流行的后处理方法分别是：1) 校准：调整分类器的softmax值，使其最大值（与预测对应）能更好地估计正确概率；2) 共形预测（CP）：基于softmax值设计一种评分方法，从而生成一个理论上保证正确类别边际覆盖率的预测集。尽管在实践中这两种指示可能同时被需要，但迄今为止它们之间的相互作用尚未被研究。为填补这一空白，本文研究了最常用的校准技术——温度缩放——对主流CP方法的影响。我们首先开展了一项广泛的实证研究，其中一项发现令人惊讶：校准对流行的自适应CP方法产生了有害影响——它常常导致更大的预测集。随后，我们从理论上分析了这一行为。我们揭示了该过程的若干数学性质，并据此为这一现象提供了合理解释。我们的研究表明，在使用自适应CP方法（因其增强的条件覆盖能力而被选用）时，可能更适合基于温度缩放校准之前（或取消温度缩放校准后）的softmax值来执行。

相关内容

关注 1

这是第25届年度会议，讨论有约束计算的所有方面，包括理论、算法、环境、语言、模型、系统和应用，如决策、资源分配、调度、配置和规划。为了纪念25周年，吉恩·弗洛伊德创作了一本“虚拟卷”来庆祝这个系列会议。信息可以在这里找到。约束编程协会有本系列中以前的会议列表。CP 2019计划将包括展示关于约束技术的高质量科学论文。除了通常的技术轨道外，CP 2019年会议还将有主题轨道。每个赛道都有一个专门的小组委员会，以确保有能力的评审员将审查这些领域的人提交的论文。官网链接：https://cp2019.a4cp.org/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日