We introduce a framework for robust uncertainty quantification in situations where labeled training data are corrupted, through noisy or missing labels. We build on conformal prediction, a statistical tool for generating prediction sets that cover the test label with a pre-specified probability. The validity of conformal prediction, however, holds under the i.i.d assumption, which does not hold in our setting due to the corruptions in the data. To account for this distribution shift, the privileged conformal prediction (PCP) method proposed leveraging privileged information (PI) -- additional features available only during training -- to re-weight the data distribution, yielding valid prediction sets under the assumption that the weights are accurate. In this work, we analyze the robustness of PCP to inaccuracies in the weights. Our analysis indicates that PCP can still yield valid uncertainty estimates even when the weights are poorly estimated. Furthermore, we introduce uncertain imputation (UI), a new conformal method that does not rely on weight estimation. Instead, we impute corrupted labels in a way that preserves their uncertainty. Our approach is supported by theoretical guarantees and validated empirically on both synthetic and real benchmarks. Finally, we show that these techniques can be integrated into a triply robust framework, ensuring statistically valid predictions as long as at least one underlying method is valid.
翻译:本文提出了一种在标注训练数据因噪声或缺失而损坏情况下的鲁棒不确定性量化框架。该方法建立在保形预测基础上——这是一种用于生成以预先指定概率覆盖测试标签的预测区间的统计工具。然而,保形预测的有效性需满足独立同分布假设,而数据损坏导致该假设在本设定中不成立。为应对这种分布偏移,现有研究提出的特权保形预测(PCP)方法利用特权信息(PI)——即仅在训练阶段可用的附加特征——对数据分布进行重加权,从而在权重估计准确的假设下获得有效的预测区间。本文系统分析了PCP方法在权重估计不准确时的鲁棒性。理论分析表明,即使权重估计存在较大偏差,PCP仍能产生有效的不确定性估计。进一步地,我们提出了不确定性填补(UI)这一新型保形方法,该方法无需依赖权重估计,而是通过保持不确定性的方式对损坏标签进行填补。该方法的理论有效性得到了严格证明,并在合成与真实基准数据集上进行了实证验证。最后,我们证明了这些技术可整合至三重鲁棒框架中,只要至少有一种基础方法有效,即可确保统计有效的预测结果。