Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.
翻译:保形预测是一种无需分布假设的不确定性估计框架,它能够构建以用户指定概率保证包含真实答案的预测集合。直观上,预测集合的大小编码了不确定性的广义概念——较大的集合对应着较高的不确定程度。本研究利用信息论将保形预测与其他不确定性概念联系起来。具体而言,我们通过结合保形预测与信息论不等式,提出了三种不同的方法来对目标变量在给定输入条件下的条件熵所描述的内在不确定性进行上界估计。此外,我们展示了保形预测与信息论之间联系的两种直接且实用的应用:(i) 更具原则性且有效的保形训练目标,该目标推广了现有方法,并支持机器学习模型从零开始的端到端训练;(ii) 将辅助信息纳入保形预测的自然机制。我们在集中式与联邦学习场景中对两种应用进行了实证验证,结果表明我们的理论成果能够降低主流保形预测方法的无效性(即平均预测集合大小)。