This paper introduces novel weighted conformal p-values and methods for model-free selective inference. The problem is as follows: given test units with covariates $X$ and missing responses $Y$, how do we select units for which the responses $Y$ are larger than user-specified values while controlling the proportion of false positives? Can we achieve this without any modeling assumptions on the data and without any restriction on the model for predicting the responses? Last, methods should be applicable when there is a covariate shift between training and test data, which commonly occurs in practice. We answer these questions by first leveraging any prediction model to produce a class of well-calibrated weighted conformal p-values, which control the type-I error in detecting a large response. These p-values cannot be passed on to classical multiple testing procedures since they may not obey a well-known positive dependence property. Hence, we introduce weighted conformalized selection (WCS), a new procedure which controls false discovery rate (FDR) in finite samples. Besides prediction-assisted candidate selection, WCS (1) allows to infer multiple individual treatment effects, and (2) extends to outlier detection with inlier distributions shifts. We demonstrate performance via simulations and applications to causal inference, drug discovery, and outlier detection datasets.
翻译:本文提出了新的加权共形p值及无模型选择性推断方法。问题如下:给定协变量$X$且响应$Y$缺失的测试单元,如何选择响应$Y$大于用户指定值的单元,同时控制误报比例?能否在不对数据做任何建模假设、不限制响应预测模型的情况下实现这一目标?最后,方法应适用于训练数据与测试数据之间存在协变量偏移的情形(这在实践中普遍存在)。我们通过以下方式回答这些问题:首先利用任意预测模型生成一类良好校准的加权共形p值,以控制检测大响应时的第一类错误。这些p值不能直接用于经典多重检验程序,因为它们可能不满足已知的正相依性质。为此,我们提出加权共形选择(WCS),一种能在有限样本中控制错误发现率(FDR)的新程序。除了预测辅助的候选选择外,WCS(1)可推断多个个体处理效应,(2)可扩展至内点分布偏移下的异常点检测。我们通过模拟实验及因果推断、药物发现、异常检测数据集上的应用证明了其性能。