Missing data are ubiquitous in public health research. When estimating causal effects, there are well-established methods to address bias to due missing outcomes. Commonly, causal estimands are defined under hypothetical interventions to "set" the exposure and to prevent missingness. We demonstrate how this framework can be extended to missing exposures. We further extend this framework to incorporate missingness on the baseline outcome, which induces missingness on the population of interest (e.g., persons at-risk). To do so, we highlight Counterfactual Strata Effects, a general class of causal estimands where the focus population is subject to missingness and/or impacted by the exposure. They are termed such because the estimand involves conditioning on a counterfactual variable.For each setting, we present the causal model, relevant counterfactuals, causal estimand, and identification result. We demonstrate with a real-data example to investigate the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. We highlight the use of TMLE with Super Learner for estimation and inference and discuss the practical consequences of our approach.
翻译:缺失数据在公共卫生研究中普遍存在。在估计因果效应时,已有成熟方法可处理由结局缺失导致的偏倚。通常,因果估计量是在假设性干预措施下定义的,这些干预旨在"设定"暴露水平并防止数据缺失。我们展示了如何将该框架扩展至暴露缺失的情形,并进一步将其推广至基线结局缺失的场景——后者会导致目标人群(如高风险人群)的缺失。为此,我们提出反事实层级效应(Counterfactual Strata Effects),这是一类通用的因果估计量,其目标人群可能面临缺失问题或受暴露影响。之所以如此命名,是因为该估计量需基于反事实变量进行条件设定。针对每种情形,我们给出了因果模型、相关反事实、因果估计量及识别结果。通过一项真实数据示例——研究乌干达农村地区饮酒对结核病(TB)感染风险的影响,我们展示了如何运用超学习器(Super Learner)辅助的目标最大似然估计(TMLE)进行推断,并讨论了该方法的实践意义。