Targeted Maximum Likelihood Estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate eight missing data methods in this context: complete-case analysis, extended TMLE incorporating outcome-missingness model, missing covariate missing indicator method, five multiple imputation (MI) approaches using parametric or machine-learning models. Six scenarios were considered, varying in exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/non-linear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a non-linear term. When choosing a method to handle missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and non-linearities is expected to perform well.
翻译:目标最大似然估计(TMLE)越来越多地被用于双重稳健因果推断,但在使用数据自适应方法结合TMLE时,如何有效处理缺失数据尚不明确。基于维多利亚青少年健康队列研究,我们开展了一项模拟研究,评估了该背景下八种缺失数据处理方法:完整案例分析、扩展的TMLE(纳入结果缺失模型)、缺失协变量缺失指标法、以及五种采用参数模型或机器学习模型的多重插补(MI)方法。研究考虑了六种场景,在暴露/结局生成模型(是否存在混杂-混杂交互作用)和缺失机制(结局是否影响其他变量缺失,以及缺失模型中是否存在交互/非线性项)上进行了变化。当结局不影响其他变量缺失时,完整案例分析和扩展的TMLE的偏倚较小。当暴露/结局生成模型包含交互项时,不含交互项的参数MI产生较大偏倚。在所有设置中,包含交互项的参数MI在减少偏倚和方差方面表现最佳,但当缺失模型包含非线性项时除外。在选择TMLE框架下处理缺失数据的方法时,研究者必须考虑缺失机制,对于MI方法还需考虑其与分析方法的兼容性。在许多场景中,纳入交互作用和非线性项的参数MI方法预计能取得良好表现。