Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.
翻译:共形预测是一种流行的现代技术,可为任意机器学习模型提供有效的预测推断。其有效性依赖于数据的可交换性和给定模型拟合算法作为数据函数的对称性假设。然而,当预测模型在实际中部署时,可交换性往往被违反。例如,如果数据分布随时间漂移,则数据点不再可交换;此外,在此类情境下,我们可能希望使用非对称算法,将更近期的观测视为更具相关性。本文推广了共形预测以处理这两个方面:我们采用加权分位数来引入对分布漂移的鲁棒性,并设计了一种新的随机化技术,以允许不将数据点对称处理的算法。我们的新方法具有可证明的鲁棒性,在因分布漂移或真实数据的其他挑战性特征而导致可交换性被违反时,覆盖率的损失显著减少,同时若数据点确实可交换,则也能实现与现有共形预测方法相同的覆盖率保证。我们通过模拟实验和关于电力与选举预测的真实数据实验,展示了这些新工具的实际效用。