Conditional validity and length efficiency are two crucial aspects of conformal prediction (CP). Achieving conditional validity ensures accurate uncertainty quantification for data subpopulations, while proper length efficiency ensures that the prediction sets remain informative and non-trivial. Despite significant efforts to address each of these issues individually, a principled framework that reconciles these two objectives has been missing in the CP literature. In this paper, we develop Conformal Prediction with Length-Optimization (CPL) - a novel framework that constructs prediction sets with (near-) optimal length while ensuring conditional validity under various classes of covariate shifts, including the key cases of marginal and group-conditional coverage. In the infinite sample regime, we provide strong duality results which indicate that CPL achieves conditional validity and length optimality. In the finite sample regime, we show that CPL constructs conditionally valid prediction sets. Our extensive empirical evaluations demonstrate the superior prediction set size performance of CPL compared to state-of-the-art methods across diverse real-world and synthetic datasets in classification, regression, and text-related settings.
翻译:条件有效性与长度效率是共形预测(CP)的两个关键维度。实现条件有效性可确保对数据子群体进行准确的不确定性量化,而良好的长度效率则能保证预测集保持信息性与非平凡性。尽管学界已针对这两个问题分别开展了大量研究,但共形预测文献中始终缺乏一个能协调这两个目标的原理性框架。本文提出了长度优化共形预测(CPL)——一种在协变量偏移的多种类型(包括边际覆盖与组条件覆盖等关键情形)下确保条件有效性,同时构建(近似)最优长度预测集的新型框架。在无限样本场景中,我们给出了强对偶性结果,表明CPL能够同时实现条件有效性与长度最优性。在有限样本场景中,我们证明CPL能够构建条件有效的预测集。通过在大规模真实数据集与合成数据集上对分类、回归及文本相关场景进行的广泛实证评估,结果表明CPL在预测集尺寸性能上显著优于当前最先进方法。