Despite attractive theoretical guarantees and practical successes, Predictive Interval (PI) given by Conformal Prediction (CP) may not reflect the uncertainty of a given model. This limitation arises from CP methods using a constant correction for all test points, disregarding their individual uncertainties, to ensure coverage properties. To address this issue, we propose using a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilizing the QRF's weights to assign more importance to samples with residuals similar to the test point. This approach results in PI lengths that are more aligned with the model's uncertainty. In addition, the weights learnt by the QRF provide a partition of the features space, allowing for more efficient computations and improved adaptiveness of the PI through groupwise conformalization. Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage, and under suitable assumptions, it also ensures conditional coverage. Our methods work for any nonconformity score and are available as a Python package. We conduct experiments on simulated and real-world data that demonstrate significant improvements compared to existing methods.
翻译:尽管共形预测(CP)在理论上具有吸引人的保证且在实践中取得了成功,但其给出的预测区间(PI)可能无法反映给定模型的不确定性。这一局限性源于CP方法对所有测试点使用恒定的校正量以确保覆盖性质,从而忽略了它们各自的不确定性。为解决此问题,我们提出使用分位数回归森林(QRF)来学习非一致性得分的分布,并利用QRF的权重为残差与测试点相似的样本赋予更高的重要性。该方法使得PI长度与模型不确定性更加一致。此外,QRF学习的权重可对特征空间进行划分,通过分组共形化实现更高效的计算并提升PI的自适应性。我们的方法无需假设即可保证有限样本的边际覆盖和训练条件覆盖,在适当假设下还可确保条件覆盖。该方法适用于任意非一致性得分,并以Python包形式提供。我们在模拟数据和真实数据上开展的实验表明,相较于现有方法,本方法具有显著优势。