Conformal prediction (CP) is a method for constructing a prediction interval around the output of a fitted model, whose validity does not rely on the model being correct--the CP interval offers a coverage guarantee that is distribution-free, but relies on the training data being drawn from the same distribution as the test data. A recent variant, weighted conformal prediction (WCP), reweights the method to allow for covariate shift between the training and test distributions. However, WCP requires knowledge of the nature of the covariate shift-specifically,the likelihood ratio between the test and training covariate distributions. In practice, since this likelihood ratio is estimated rather than known exactly, the coverage guarantee may degrade due to the estimation error. In this paper, we consider a special scenario where observations belong to a finite number of groups, and these groups determine the covariate shift between the training and test distributions-for instance, this may arise if the training set is collected via stratified sampling. Our results demonstrate that in this special case, the predictive coverage guarantees of WCP can be drastically improved beyond the bounds given by existing estimation error bounds.
翻译:共形预测(CP)是一种在拟合模型输出周围构建预测区间的方法,其有效性不依赖于模型本身的正确性——CP区间具有分布无关的覆盖保证,但要求训练数据与测试数据来自同一分布。近期提出的变体——加权共形预测(WCP)通过重新加权方法,允许训练分布与测试分布之间存在协变量偏移。然而,WCP需要已知协变量偏移的性质——具体而言,需已知测试与训练协变量分布之间的似然比。在实际应用中,由于该似然比是估计值而非精确已知,覆盖保证可能因估计误差而降低。本文考虑一种特殊场景:观测值属于有限个分组,且这些分组决定了训练分布与测试分布之间的协变量偏移(例如,训练集通过分层抽样收集时可能出现此情况)。我们的结果表明,在此特殊情形下,WCP的预测覆盖保证能远超现有估计误差界限所给出的范围,实现显著提升。