Missing data are ubiquitous in public health research. When estimating causal effects, there are well-established methods to address bias to due missing outcomes. Commonly, causal estimands are defined under hypothetical interventions to "set" the exposure and to prevent missingness. We demonstrate how this framework can be extended to missing exposures. We further extend this framework to incorporate missingness on the baseline outcome, which induces missingness on the population of interest. To do so, we highlight the use of Counterfactual Strata Effects: causal estimands where the focus population is subject to missingness and/or impacted by the exposure. Our work is motivated by SEARCH-TB's investigation of the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. This study posed several real-world challenges: confounding, missingness on the exposure (alcohol use), missingness on the baseline outcome (defining who was at-risk of TB and, thus, in the focus population), and missingness on the outcome at follow-up (capturing who acquired TB). We present a series of causal models and identification results to demonstrate the handling of missingness in these settings. We highlight the use of TMLE with Super Learner and the real-world consequences of our approach.
翻译:缺失数据在公共卫生研究中普遍存在。在估计因果效应时,已有成熟方法可用于处理结局缺失导致的偏倚。通常,因果估计量是在假设性干预措施下定义的,这些干预措施旨在"设定"暴露水平并阻止缺失发生。我们展示了如何将该框架扩展至暴露缺失场景。进一步地,我们拓展了这一框架以纳入基线结局缺失问题——该缺失会导致关注人群的样本缺失。为此,我们重点阐述了反事实分层效应的应用:这是一种关注人群本身存在缺失和/或受暴露影响的因果估计量。本研究受SEARCH-TB项目启发,该项目旨在探究乌干达农村地区酒精摄入对结核病新发感染风险的影响。该研究面临多重现实挑战:混杂因素、暴露(酒精使用)缺失、基线结局缺失(即界定结核病高危人群并确定关注人群),以及随访结局缺失(确认结核病感染病例)。我们通过一系列因果模型与识别结果,展示了如何在这些场景中处理缺失数据,并重点介绍了采用超级学习器的目标最大似然估计(TMLE)方法及其在本研究中的现实应用意义。