We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a semi-parametrically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is semi-parametrically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.
翻译:本文提出了一种在存在未观测混杂情况下进行个性化离线策略学习的新方法。我们由此解决了标准策略学习的一个关键局限:标准策略学习假设无混杂性,即不存在同时影响治疗分配与结果的未观测因素。然而,该假设常被违背,导致标准策略学习产生有偏估计,进而可能产生有害的策略。为应对此局限,我们采用因果敏感性分析,并推导出一个半参数高效的估计量,用于计算未观测混杂下价值函数的锐利边界。我们的估计量具有三个优势:(1)与现有工作不同,我们的估计量避免了基于逆倾向加权结果的不稳定极小极大优化。(2)我们的估计量是半参数高效的。(3)我们证明了该估计量可导出最优的混杂鲁棒策略。最后,我们将理论扩展到未观测混杂下的相关任务——策略改进,即当存在基线策略(如标准治疗方案)可用时。我们在合成数据与真实数据上的实验表明,本方法优于简单的插件方法及现有基线。本方法对于未观测混杂可能成问题的决策场景(如医疗保健与公共政策)具有高度相关性。