Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified values. We develop a method that wraps around any prediction model to produce a subset of candidates while controlling the proportion of falsely selected units. Building upon the conformal inference framework, our method first constructs p-values that quantify the statistical evidence for large outcomes; it then determines the shortlist by comparing the p-values to a threshold introduced in the multiple testing literature. In many cases, the procedure selects candidates whose predictions are above a data-dependent threshold. Our theoretical guarantee holds under mild exchangeability conditions on the samples, generalizing existing results on multiple conformal p-values. We demonstrate the empirical performance of our method via simulations, and apply it to job hiring and drug discovery datasets.
翻译:决策制定或科学发现流程(如职位招聘和药物发现)通常涉及多个阶段:在进行任何资源密集型步骤之前,往往会先进行初步筛选,利用机器学习模型的预测从大量候选对象中精选出少数候选者。我们研究的目标是筛选程序,旨在选出那些未观测结果超过用户指定值的候选对象。我们开发了一种方法,该方法可包装在任何预测模型之上,生成候选子集,同时控制误选单元的比例。基于共形推断框架,我们的方法首先构建p值,量化大结果出现的统计证据;随后通过将p值与多重检验文献中引入的阈值进行比较,确定最终筛选名单。在许多情况下,该程序会选择预测值超过数据相关阈值的候选对象。我们的理论保证在样本的弱可交换性条件下成立,推广了现有关于多重共形p值的研究结果。我们通过模拟实验展示了该方法在实证中的性能,并将其应用于职位招聘和药物发现数据集。