Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lead to policies that perform suboptimally on the target population. We consider a model where observable attributes can impact sample selection probabilities arbitrarily but the effect of unobservable attributes is bounded by a constant, and we aim to learn policies with the best possible performance guarantees that hold under any sampling bias of this type. In particular, we derive the partial identification result for the worst-case welfare in the presence of sampling bias and show that the optimal max-min, max-min gain, and minimax regret policies depend on both the conditional average treatment effect (CATE) and the conditional value-at-risk (CVaR) of potential outcomes given covariates. To avoid finite-sample inefficiencies of plug-in estimates, we further provide an end-to-end procedure for learning the optimal max-min and max-min gain policies that does not require the separate estimation of nuisance parameters.
翻译:实践者常使用来自随机对照试验的数据来学习一种可在目标人群中部署的治疗分配策略。这样做的一个反复出现的问题是,即使随机试验执行良好(即内部有效性成立),研究参与者可能无法代表目标人群的随机样本(即外部有效性失效)——这可能导致在目标人群上表现次优的策略。我们考虑一个模型,其中可观测属性可以任意影响样本选择概率,但不可观测属性的影响被常数限制,并且我们旨在学习在此类采样偏差下具有最佳性能保证的策略。具体而言,我们推导了存在采样偏差时最差情况福利的部分识别结果,并表明最优最大最小、最大最小增益和最小化最大遗憾策略同时依赖于条件平均处理效应(CATE)和潜在结果在给定协变量下的条件风险价值(CVaR)。为避免插件估计的有限样本低效性,我们进一步提供了一种无需单独估计干扰参数的端到端程序,用于学习最优最大最小和最大最小增益策略。