Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset (referred to as training-conditional coverage) remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results characterize the reliability of the prediction sets in terms of the severity of distributional changes and the size of the training dataset.
翻译:近年来,共形预测方法已扩展至协变量偏移场景,即训练数据与测试数据中协变量的分布存在差异。现有研究虽能保证此类方法生成的预测集达到高于名义水平的边际覆盖度,但其在训练数据集条件下的覆盖度(称为训练条件覆盖度)仍未得到探究。本文通过推导训练条件覆盖度分布尾部的上界,填补了这一空白,为这些方法提供了概率近似正确(PAC)保证。我们的结果从分布变化程度与训练数据集规模两方面刻画了预测集的可靠性。