In cluster-randomized trials, missing data can occur in various ways, including missing values in outcomes and baseline covariates at the individual or cluster level, or completely missing information for non-participants. Among the various types of missing data in CRTs, missing outcomes have attracted the most attention. However, no existing method comprehensively addresses all the aforementioned types of missing data simultaneously due to their complexity. This gap in methodology may lead to confusion and potential pitfalls in the analysis of CRTs. In this article, we propose a doubly-robust estimator for a variety of estimands that simultaneously handles missing outcomes under a missing-at-random assumption, missing covariates with the missing-indicator method (with no constraint on missing covariate distributions), and missing cluster-population sizes via a uniform sampling framework. Furthermore, we provide three approaches to improve precision by choosing the optimal weights for intracluster correlation, leveraging machine learning, and modeling the propensity score for treatment assignment. To evaluate the impact of violated missing data assumptions, we additionally propose a sensitivity analysis that measures when missing data alter the conclusion of treatment effect estimation. Simulation studies and data applications both show that our proposed method is valid and superior to the existing methods.
翻译:在整群随机试验中,缺失数据可能以多种形式出现,包括个体或整群层面的结局变量与基线协变量缺失,以及非参与者的信息完全缺失。在整群随机试验的各类缺失数据中,结局缺失问题受到最多关注。然而,由于上述各类缺失数据的复杂性,现有方法均无法同时全面处理所有类型。这一方法论空白可能导致整群随机试验分析中出现混乱与潜在陷阱。本文针对多种目标估计量提出一种双重稳健估计方法,可在随机缺失假设下同时处理结局缺失、采用缺失指标方法(不对缺失协变量分布施加约束)处理协变量缺失,并通过均匀抽样框架处理整群总体规模缺失问题。此外,我们提供三种提升估计精度的策略:选择最优权重以校正群内相关性、引入机器学习方法、以及为处理分配倾向得分建模。为评估缺失数据假设被违反时的影响,我们进一步提出一种敏感性分析方法,用于衡量缺失数据何时会改变处理效应估计的结论。模拟研究与数据应用均表明,所提方法具有有效性且优于现有方法。