Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best F1-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its statistical similarity with the test examples; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.
翻译:分割共形预测因其能为黑箱神经模型的预测提供具有形式保障的不确定集或区间,确保预设概率下包含真实标注,近期引发了广泛关注。虽然原始公式假设数据可交换,但部分扩展方法已能处理非交换数据——这在许多现实场景中普遍存在。与此同时,共形方法在提供更广泛目标的统计保障方面取得了进展,例如约束最佳F1分数或期望水平下最小化假阴性率。本文通过提出非交换共形风险控制,融合并拓展了这两方面的研究,该方法可在数据非交换时控制任意单调损失函数的期望值。我们的框架具有灵活性,所需假设极少,允许根据测试样本的统计相似性对数据进行加权;通过精心选择权重,可得到更紧的界,这使得该框架在存在变点、时间序列或其他形式的分布漂移时尤为有效。基于合成数据与真实数据的实验验证了该方法的实用性。