Machine learning classification tasks often benefit from predicting a set of possible labels with confidence scores to capture uncertainty. However, existing methods struggle with the high-dimensional nature of the data and the lack of well-calibrated probabilities from modern classification models. We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models that predict the order of labels correctly, even if not well-calibrated. Our approach constructs prediction sets that achieve the desired coverage rate while managing their size. We provide a theoretical analysis of the expected size of the conformal prediction sets based on the rank distribution of the underlying classifier. Through extensive experiments, we demonstrate that our method outperforms existing techniques on various datasets, providing reliable uncertainty quantification. Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation. This work advances the practical deployment of machine learning systems by enabling reliable uncertainty quantification.
翻译:机器学习分类任务通常受益于预测一组带有置信度得分的可能标签以捕捉不确定性。然而,现有方法难以应对数据的高维特性以及现代分类模型缺乏良好校准概率的问题。我们提出一种新颖的保形预测方法,该方法采用基于排序的评分函数,适用于即使未经过良好校准但能正确预测标签顺序的分类模型。我们的方法构建的预测集在达到预期覆盖率的同时,还能有效控制其规模。我们基于底层分类器的排序分布,对保形预测集的期望规模进行了理论分析。通过大量实验,我们证明该方法在多个数据集上优于现有技术,提供了可靠的不确定性量化。我们的贡献包括:一种新颖的保形预测方法、理论分析以及实证评估。这项研究通过实现可靠的不确定性量化,推动了机器学习系统的实际部署。