Estimating conditional average treatment effects (CATE) is challenging, especially when treatment information is missing. Although this is a widespread problem in practice, CATE estimation with missing treatments has received little attention. In this paper, we analyze CATE estimation in the setting with missing treatments where unique challenges arise in the form of covariate shifts. We identify two covariate shifts in our setting: (i) a covariate shift between the treated and control population; and (ii) a covariate shift between the observed and missing treatment population. We first theoretically show the effect of these covariate shifts by deriving a generalization bound for estimating CATE in our setting with missing treatments. Then, motivated by our bound, we develop the missing treatment representation network (MTRNet), a novel CATE estimation algorithm that learns a balanced representation of covariates using domain adaptation. By using balanced representations, MTRNet provides more reliable CATE estimates in the covariate domains where the data are not fully observed. In various experiments with semi-synthetic and real-world data, we show that our algorithm improves over the state-of-the-art by a substantial margin.
翻译:估计条件平均处理效应(CATE)具有挑战性,尤其当治疗信息缺失时。尽管这一问题在实践中普遍存在,但针对缺失治疗情况的CATE估计却鲜受关注。本文分析了在治疗信息缺失场景下的CATE估计,其中协变量偏移带来了独特挑战。我们识别出该场景中的两种协变量偏移:(i)治疗组与对照组之间的协变量偏移;(ii)观测组与缺失治疗组之间的协变量偏移。首先,我们通过推导缺失治疗场景下CATE估计的泛化界,从理论上展示了这些协变量偏移的影响。随后,受该泛化界的启发,我们提出了缺失治疗表征网络(MTRNet)——一种新颖的CATE估计算法,该算法利用域自适应学习协变量的平衡表征。通过使用平衡表征,MTRNet能够在数据未完全观测的协变量域中提供更可靠的CATE估计。在半合成数据与真实数据的多项实验中,我们证明该算法在性能上显著超越了现有最优方法。