Cluster-randomized experiments are increasingly used to evaluate interventions in routine practice conditions, and researchers often adopt model-based methods with covariate adjustment in the statistical analyses. However, the validity of model-based covariate adjustment is unclear when the working models are misspecified, leading to ambiguity of estimands and risk of bias. In this article, we first adapt two conventional model-based methods, generalized estimating equations and linear mixed models, with weighted g-computation to achieve robust inference for cluster-average and individual-average treatment effects. To further overcome the limitations of model-based covariate adjustment methods, we propose an efficient estimator for each estimand that allows for flexible covariate adjustment and additionally addresses cluster size variation dependent on treatment assignment and other cluster characteristics. Such cluster size variations often occur post-randomization and, if ignored, can lead to bias of model-based estimators. For our proposed efficient covariate-adjusted estimator, we prove that when the nuisance functions are consistently estimated by machine learning algorithms, the estimator is consistent, asymptotically normal, and efficient. When the nuisance functions are estimated via parametric working models, the estimator is triply-robust. Simulation studies and analyses of three real-world cluster-randomized experiments demonstrate that the proposed methods are superior to existing alternatives.
翻译:整群随机实验越来越多地被用于评估常规实践条件下的干预措施,研究者通常在统计分析中采用基于模型的协变量调整方法。然而,当工作模型存在误设时,基于模型的协变量调整方法的有效性尚不明确,这可能导致目标估计量模糊和偏倚风险。本文首先对两种传统基于模型的方法(广义估计方程和线性混合模型)进行改进,结合加权g计算法实现集群平均效应和个体平均效应的稳健推断。为进一步克服基于模型的协变量调整方法的局限性,我们针对每个目标估计量提出了一种高效估计量,其允许灵活的协变量调整,并额外处理了依赖于处理分配及其他集群特征的集群规模变异问题。这类集群规模变异通常发生在随机化之后,若被忽略可能导致基于模型的估计量产生偏倚。对于所提出的高效协变量调整估计量,我们证明:当通过机器学习算法一致估计干扰函数时,该估计量具有一致性、渐近正态性和有效性;当通过参数化工作模型估计干扰函数时,该估计量具有三重稳健性。模拟研究和三个真实整群随机实验的分析结果表明,本方法优于现有替代方法。