Causal inference methods based on electronic health record (EHR) databases must simultaneously handle confounding and missing data. Vast scholarship exists aimed at addressing these two issues separately, but surprisingly few papers attempt to address them simultaneously. In practice, when faced with simultaneous missing data and confounding, analysts may proceed by first imputing missing data and subsequently using outcome regression or inverse-probability weighting (IPW) to address confounding. However, little is known about the theoretical performance of such $\textit{ad hoc}$ methods. In a recent paper Levis $\textit{et al.}$ outline a robust framework for tackling these problems together under certain identifying conditions, and introduce a pair of estimators for the average treatment effect (ATE), one of which is non-parametric efficient. In this work we present a series of simulations, motivated by a published EHR based study of the long-term effects of bariatric surgery on weight outcomes, to investigate these new estimators and compare them to existing $\textit{ad hoc}$ methods. While the latter perform well in certain scenarios, no single estimator is uniformly best. As such, the work of Levis $\textit{et al.}$ may serve as a reasonable default for causal inference when handling confounding and missing data together.
翻译:基于电子健康记录(EHR)数据库的因果推断方法必须同时处理混杂因素和缺失数据。大量研究致力于分别解决这两个问题,但令人惊讶的是,很少有论文尝试同时处理它们。在实践中,当同时面临缺失数据和混杂因素时,分析人员可能会先对缺失数据进行填补,然后使用结果回归或逆概率加权(IPW)来处理混杂因素。然而,人们对这种$\textit{ad hoc}$方法的理论性能知之甚少。在最近的一篇论文中,Levis $\textit{et al.}$ 概述了一个在某些识别条件下共同解决这些问题的稳健框架,并引入了一对用于估计平均处理效应(ATE)的估计量,其中一个是非参数有效的。在这项工作中,我们基于一项已发表的、利用EHR研究减肥手术对体重结果的长期影响的研究,设计了一系列模拟实验,以研究这些新估计量并将其与现有的$\textit{ad hoc}$方法进行比较。虽然后者在某些情况下表现良好,但没有一个估计量是普遍最优的。因此,在同时处理混杂因素和缺失数据时,Levis $\textit{et al.}$ 的工作可以作为一个合理的默认因果推断框架。