Causal inference methods based on electronic health record (EHR) databases must simultaneously handle confounding and missing data. Vast scholarship exists aimed at addressing these two issues separately, but surprisingly few papers attempt to address them simultaneously. In practice, when faced with simultaneous missing data and confounding, analysts may proceed by first imputing missing data and subsequently using outcome regression or inverse-probability weighting (IPW) to address confounding. However, little is known about the theoretical performance of such $\textit{ad hoc}$ methods. In a recent paper Levis $\textit{et al.}$ outline a robust framework for tackling these problems together under certain identifying conditions, and introduce a pair of estimators for the average treatment effect (ATE), one of which is non-parametric efficient. In this work we present a series of simulations, motivated by a published EHR based study of the long-term effects of bariatric surgery on weight outcomes, to investigate these new estimators and compare them to existing $\textit{ad hoc}$ methods. While the latter perform well in certain scenarios, no single estimator is uniformly best. As such, the work of Levis $\textit{et al.}$ may serve as a reasonable default for causal inference when handling confounding and missing data together.
翻译:基于电子健康记录(EHR)数据库的因果推断方法必须同时处理混杂和缺失数据问题。现有大量研究分别针对这两个问题,但令人惊讶的是,试图同时解决它们的论文却很少。在实践中,当同时面临缺失数据和混杂问题时,分析者可能会先对缺失数据进行填补,然后使用结果回归或逆概率加权(IPW)来处理混杂。然而,对于此类临时方法的理论性能知之甚少。在最近的一篇论文中,Levis等人概述了一个在特定识别条件下共同解决这些问题的稳健框架,并引入了一对用于估计平均处理效应(ATE)的估计量,其中一个是非参数有效的。在本研究中,我们基于一项已发表的、利用EHR研究减重手术对体重结果的长期影响的论文,设计了一系列模拟实验,以研究这些新估计量并将其与现有的临时方法进行比较。虽然后者在某些情况下表现良好,但没有一个估计量在所有情况下都是最优的。因此,在处理混杂和缺失数据时,Levis等人的工作可以作为一个合理的默认因果推断框架。