Conditional validity and length efficiency are two crucial aspects of conformal prediction (CP). Conditional validity ensures accurate uncertainty quantification for data subpopulations, while proper length efficiency ensures that the prediction sets remain informative. Despite significant efforts to address each of these issues individually, a principled framework that reconciles these two objectives has been missing in the CP literature. In this paper, we develop Conformal Prediction with Length-Optimization (CPL) - a novel and practical framework that constructs prediction sets with (near-) optimal length while ensuring conditional validity under various classes of covariate shifts, including the key cases of marginal and group-conditional coverage. In the infinite sample regime, we provide strong duality results which indicate that CPL achieves conditional validity and length optimality. In the finite sample regime, we show that CPL constructs conditionally valid prediction sets. Our extensive empirical evaluations demonstrate the superior prediction set size performance of CPL compared to state-of-the-art methods across diverse real-world and synthetic datasets in classification, regression, and large language model-based multiple choice question answering. An Implementation of our algorithm can be accessed at the following link: https://github.com/shayankiyani98/CP.
翻译:条件有效性和长度效率是共形预测(CP)的两个关键方面。条件有效性确保了对数据子群体的准确不确定性量化,而适当的长度效率则保证了预测集保持信息性。尽管已有大量研究分别针对这两个问题展开探讨,但在CP文献中始终缺乏一个能够协调这两个目标的原理性框架。本文提出了长度优化共形预测(CPL)——一种新颖实用的框架,该框架在确保各类协变量偏移(包括边际覆盖和组条件覆盖等关键情形)下条件有效性的同时,构造出具有(近似)最优长度的预测集。在无限样本条件下,我们给出了强对偶性结果,表明CPL能够同时实现条件有效性和长度最优性。在有限样本条件下,我们证明了CPL能够构造条件有效的预测集。我们通过大量实证评估表明,在分类、回归以及基于大语言模型的多项选择题回答任务中,相较于现有先进方法,CPL在多样化的真实世界数据集和合成数据集上均展现出更优的预测集尺寸性能。算法实现可通过以下链接获取:https://github.com/shayankiyani98/CP。