Selective conformal prediction aims to construct prediction sets with valid coverage for a test unit conditional on it being selected by a data-driven mechanism. While existing methods in the offline setting handle any selection mechanism that is permutation invariant to the labeled data, their extension to the online setting -- where data arrives sequentially and later decisions depend on earlier ones -- is challenged by the fact that the selection mechanism is naturally asymmetric. As such, existing methods only address a limited collection of selection mechanisms. In this paper, we propose PErmutation-based Mondrian Conformal Inference (PEMI), a general permutation-based framework for selective conformal prediction with arbitrary asymmetric selection rules. Motivated by full and Mondrian conformal prediction, PEMI identifies all permutations of the observed data (or a Monte-Carlo subset thereof) that lead to the same selection event, and calibrates a prediction set using conformity scores over this selection-preserving reference set. Under standard exchangeability conditions, our prediction sets achieve finite-sample exact selection-conditional coverage for any asymmetric selection mechanism and any prediction model. PEMI naturally incorporates additional offline labeled data, extends to selection mechanisms with multiple test samples, and achieves FCR control with fine-grained selection taxonomies. We further work out several efficient instantiations for commonly-used online selection rules, including covariate-based rules, conformal p/e-values-based procedures, and selection based on earlier outcomes. Finally, we demonstrate the efficacy of our methods across various selection rules on a real drug discovery dataset and investigate their performance via simulations.
翻译:选择性共形预测旨在为被数据驱动机制选中的测试单元构建具有有效覆盖率的预测集合。尽管离线设置中的现有方法能够处理对标注数据具有置换不变性的任意选择机制,但其向在线设置(数据顺序到达且后续决策依赖于前期结果)的扩展面临挑战,因为选择机制天然具有非对称性。因此,现有方法仅能处理有限类别的选择机制。本文提出基于置换的蒙德里安共形推断(PEMI),这是一个适用于任意非对称选择规则的选择性共形预测通用置换框架。受完整共形预测与蒙德里安共形预测的启发,PEMI识别所有能导致相同选择事件的观测数据置换(或其蒙特卡洛子集),并基于该选择保持参考集上的符合度分数校准预测集合。在标准可交换性条件下,我们的预测集合对任意非对称选择机制和任意预测模型均实现有限样本精确的选择条件覆盖。PEMI天然支持融合额外的离线标注数据,可扩展至多测试样本的选择机制,并能通过细粒度选择分类实现错误发现率控制。我们进一步针对常用在线选择规则推导出若干高效实例化方案,包括基于协变量的规则、基于共形p值/e值的程序以及基于早期结果的选择机制。最后,我们在真实药物发现数据集上验证了所提方法对多种选择规则的有效性,并通过仿真实验系统评估其性能。