This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, enabling more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise theoretical characterization of the effective coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through new calibration algorithms. Our solution is flexible and can leverage different modeling assumptions about the label contamination process, while requiring no knowledge about the data distribution or the inner workings of the machine-learning classifier. The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.
翻译:本文针对分类任务提出了新颖的共形预测方法,该方法能够自动适应校准样本中的随机标签污染,相较于现有最优方法可生成信息量更丰富的预测集并具有更强覆盖保证。这一成果得益于对标准共形推断在标签污染情况下有效覆盖膨胀(或收缩)的精确理论刻画,并通过新开发的校准算法将理论成果付诸实践。我们的解决方案具有灵活性,能够利用关于标签污染过程的不同建模假设,同时无需了解数据分布或机器学习分类器的内部工作机制。通过大量模拟实验以及CIFAR-10H图像数据集上的物体分类应用,充分验证了所提方法的优势。