Conformal prediction (CP) is a method for constructing a prediction interval around the output of a fitted model, whose validity does not rely on the model being correct--the CP interval offers a coverage guarantee that is distribution-free, but relies on the training data being drawn from the same distribution as the test data. A recent variant, weighted conformal prediction (WCP), reweights the method to allow for covariate shift between the training and test distributions. However, WCP requires knowledge of the nature of the covariate shift-specifically,the likelihood ratio between the test and training covariate distributions. In practice, since this likelihood ratio is estimated rather than known exactly, the coverage guarantee may degrade due to the estimation error. In this paper, we consider a special scenario where observations belong to a finite number of groups, and these groups determine the covariate shift between the training and test distributions-for instance, this may arise if the training set is collected via stratified sampling. Our results demonstrate that in this special case, the predictive coverage guarantees of WCP can be drastically improved beyond the bounds given by existing estimation error bounds.
翻译:共形预测(CP)是一种在拟合模型输出周围构建预测区间的方法,其有效性不依赖于模型本身的正确性——CP区间提供了一种无分布假设的覆盖保证,但要求训练数据与测试数据来自同一分布。近期提出的变体——加权共形预测(WCP)通过对方法进行重新加权,允许训练分布与测试分布之间存在协变量偏移。然而,WCP需要知晓协变量偏移的性质,具体而言是测试分布与训练分布之间的似然比。在实际应用中,由于似然比是估计值而非精确已知,覆盖保证可能因估计误差而降低。本文考虑一种特殊场景:观测数据属于有限个分组,且协变量偏移由这些分组决定(例如,训练集通过分层抽样收集)。研究结果表明,在这种特殊情形下,WCP的预测覆盖保证可显著超越现有估计误差边界所给出的理论界限。