Prediction sets capture uncertainty by predicting sets of labels rather than individual labels, enabling downstream decisions to conservatively account for all plausible outcomes. Conformal inference algorithms construct prediction sets guaranteed to contain the true label with high probability. These guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: the CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.
翻译:预测集通过预测标签集合而非单个标签来捕获不确定性,使下游决策能够保守地考虑所有可能结果。共形推断算法构建的预测集能够以高概率保证包含真实标签,但这一保证在分布偏移(即最需要可靠不确定性量化的场景)下失效。我们提出一种新颖算法,可在标签偏移场景下构建具有PAC保证的预测集。该方法首先估计目标域中各类的预测概率与混淆矩阵,随后通过高斯消元算法传播这些估计中的不确定性,为重要性权重的置信区间计算提供依据,最终利用这些区间构建预测集。我们在五个数据集上评估该方法:CIFAR-10、ChestX-Ray和Entity-13图像数据集,表格型CDC Heart数据集,以及AGNews文本数据集。实验表明,所提算法在满足PAC保证的同时,能生成比多个基线方法更小且更具信息量的预测集。