Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks with many and/or imbalanced classes. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.
翻译:保形预测是一种新兴的不确定性量化框架,它允许我们构建一个预测集合,以预先指定的边际或条件概率覆盖真实标签。尽管分类问题中的有效覆盖保证已得到广泛研究,但保形预测通常会产生较大的预测集,这在实践中可能并不实用。对于存在多类别和/或不平衡类别的分类任务,在类别条件覆盖设置下这一问题尤为严重。本文提出排序校准的类别条件保形预测算法,以缩小预测集规模来实现类别条件覆盖,其中有效覆盖对每个类别均成立。与标准的类别条件保形预测方法对每个类别的类内适应度分数进行统一阈值处理不同,增强标签排序校准步骤允许RC3P选择性地仅对类内top-k误差较小的类别子集迭代执行此类内阈值处理子程序。我们证明,无论分类器和数据分布如何,RC3P都能实现类别覆盖。我们还证明,与CCP方法相比,RC3P减小了预测集的规模。在多个真实数据集上的综合实验表明,RC3P实现了类别覆盖,且预测集规模平均减少26.25%。