This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, leading to more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise characterization of the effective coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through new calibration algorithms. Our solution is flexible and can leverage different modeling assumptions about the label contamination process, while requiring no knowledge of the underlying data distribution or of the inner workings of the machine-learning classifier. The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.
翻译:本文针对分类任务提出了新颖的保形预测方法,该方法能自动适应校准样本中的随机标签污染,相较于现有最优方法,可生成信息量更丰富的预测集并实现更强的覆盖保证。其实现基于对标签污染场景下标准保形推断所面临有效覆盖膨胀(或收缩)现象的精确刻画,进而通过新型校准算法将理论分析转化为可操作方案。本方案具有灵活性,能够利用关于标签污染过程的不同建模假设,同时无需了解底层数据分布或机器学习分类器的内部机制。通过大量模拟实验及CIFAR-10H图像数据集的物体分类应用,验证了所提方法的优越性。