Conformal prediction (CP) is a method for constructing a prediction interval around the output of a fitted model, whose validity does not rely on the model being correct--the CP interval offers a coverage guarantee that is distribution-free, but relies on the training data being drawn from the same distribution as the test data. A recent variant, weighted conformal prediction (WCP), reweights the method to allow for covariate shift between the training and test distributions. However, WCP requires knowledge of the nature of the covariate shift-specifically,the likelihood ratio between the test and training covariate distributions. In practice, since this likelihood ratio is estimated rather than known exactly, the coverage guarantee may degrade due to the estimation error. In this paper, we consider a special scenario where observations belong to a finite number of groups, and these groups determine the covariate shift between the training and test distributions-for instance, this may arise if the training set is collected via stratified sampling. Our results demonstrate that in this special case, the predictive coverage guarantees of WCP can be drastically improved beyond the bounds given by existing estimation error bounds.
翻译:共形预测(CP)是一种围绕拟合模型输出构建预测区间的方法,其有效性不依赖于模型的正确性——CP区间提供了一种无分布假设的覆盖保证,但要求训练数据与测试数据来自相同的分布。近期的一种变体——加权共形预测(WCP),通过对方法进行加权处理,允许训练分布与测试分布之间存在协变量偏移。然而,WCP需要了解协变量偏移的具体性质——具体而言,即测试与训练协变量分布之间的似然比。在实际应用中,由于该似然比是估计所得而非精确已知,估计误差可能导致覆盖保证性能下降。本文考虑一种特殊场景:观测值属于有限数量的分组,且这些分组决定了训练分布与测试分布之间的协变量偏移——例如,当训练集通过分层抽样收集时可能出现这种情况。我们的结果表明,在这种特殊情况下,WCP的预测覆盖保证能够显著超越现有估计误差界限所提供的范围。