In this work, we consider the problem of building distribution-free prediction intervals with finite-sample conditional coverage guarantees. Conformal prediction (CP) is an increasingly popular framework for building prediction intervals with distribution-free guarantees, but these guarantees only ensure marginal coverage: the probability of coverage is averaged over a random draw of both the training and test data, meaning that there might be substantial undercoverage within certain subpopulations. Instead, ideally, we would want to have local coverage guarantees that hold for each possible value of the test point's features. While the impossibility of achieving pointwise local coverage is well established in the literature, many variants of conformal prediction algorithm show favorable local coverage properties empirically. Relaxing the definition of local coverage can allow for a theoretical understanding of this empirical phenomenon. We aim to bridge this gap between theoretical validation and empirical performance by proving achievable and interpretable guarantees for a relaxed notion of local coverage. Building on the localized CP method of Guan (2023) and the weighted CP framework of Tibshirani et al. (2019), we propose a new method, randomly-localized conformal prediction (RLCP), which returns prediction intervals that are not only marginally valid but also achieve a relaxed local coverage guarantee and guarantees under covariate shift. Through a series of simulations and real data experiments, we validate these coverage guarantees of RLCP while comparing it with the other local conformal prediction methods.
翻译:在本研究中,我们考虑构建具有有限样本条件覆盖保证的分布无关预测区间的问题。保形预测(CP)作为一种日益流行的框架,能够构建具有分布无关保证的预测区间,但这些保证仅确保边际覆盖:覆盖概率是在训练数据和测试数据的随机抽取上进行平均的,这意味着在某些子群体中可能存在严重的覆盖不足。相反,理想情况下,我们希望获得对测试点特征每个可能值都成立的局部覆盖保证。尽管文献中已充分证明实现逐点局部覆盖的不可能性,但许多保形预测算法的变体在经验上显示出良好的局部覆盖特性。放宽局部覆盖的定义可以为理解这一经验现象提供理论依据。我们旨在通过为一种放宽的局部覆盖概念证明可达成且可解释的保证,来弥合理论验证与实证性能之间的差距。基于Guan(2023)的局部化CP方法和Tibshirani等人(2019)的加权CP框架,我们提出了一种新方法——随机局部化保形预测(RLCP),该方法返回的预测区间不仅具有边际有效性,还能实现放宽的局部覆盖保证,并在协变量偏移下提供保证。通过一系列模拟和真实数据实验,我们在将RLCP与其他局部保形预测方法进行比较的同时,验证了其覆盖保证。