Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.
翻译:共形预测是一种流行的现代技术,可为任意机器学习模型提供有效的预测推理。其有效性依赖于数据的可交换性假设,以及给定模型拟合算法作为数据函数的对称性。然而,当预测模型在实践中部署时,可交换性常常被违反。例如,如果数据分布随时间漂移,则数据点不再可交换;此外,在此类设置中,我们可能希望使用一种非对称算法,将最近的观测视为更相关。本文对共形预测进行了推广,以处理这两个方面:我们采用加权分位数来引入对分布漂移的鲁棒性,并设计了一种新的随机化技术,以允许不将数据点对称处理的算法。我们的新方法具有可证明的鲁棒性,当由于分布漂移或真实数据的其他具有挑战性的特征而违反可交换性时,其覆盖率的损失显著降低,同时如果数据点实际上是可交换的,也能达到与现有共形预测方法相同的覆盖率保证。我们通过模拟实验和关于电力与选举预测的真实数据实验,展示了这些新工具的实际效用。