In cluster-randomized trials (CRTs), missing data can occur in various ways, including missing values in outcomes and baseline covariates at the individual or cluster level, or completely missing information for non-participants. Among the various types of missing data in CRTs, missing outcomes have attracted the most attention. However, no existing methods can simultaneously address all aforementioned types of missing data in CRTs. To fill in this gap, we propose a new doubly-robust estimator for the average treatment effect on a variety of scales. The proposed estimator simultaneously handles missing outcomes under missingness at random, missing covariates without constraining the missingness mechanism, and missing cluster-population sizes via a uniform sampling mechanism. Furthermore, we detail key considerations to improve precision by specifying the optimal weights, leveraging machine learning, and modeling the treatment assignment mechanism. Finally, to evaluate the impact of violating missing data assumptions, we contribute a new sensitivity analysis framework tailored to CRTs. Simulation studies and a real data application both demonstrate that our proposed methods are effective in handling missing data in CRTs and superior to the existing methods.
翻译:在整群随机试验中,缺失数据可能以多种形式出现,包括个体或整群层面结局与基线协变量的缺失值,以及非参与者的完全信息缺失。在整群随机试验的各类缺失数据中,结局缺失受到的关注最多。然而,现有方法无法同时处理上述所有类型的缺失数据。为填补这一空白,我们提出一种新的双重稳健估计量,用于估算多种尺度上的平均处理效应。该估计量能同时处理随机缺失下的结局缺失、无缺失机制约束的协变量缺失,以及通过均匀抽样机制产生的整群总体规模缺失。此外,我们详细阐述了通过指定最优权重、利用机器学习技术及建模处理分配机制来提升精度的关键考量。最后,为评估违反缺失数据假设的影响,我们提出了一套针对整群随机试验的敏感性分析新框架。模拟研究与实际数据应用均表明,所提方法能有效处理整群随机试验中的缺失数据,且优于现有方法。