Clustered randomized controlled trials are often stratified or pair-matched to improve covariate balance and efficiency. Sample average treatment effects (SATEs) are commonly estimated by averaging stratum-level treatment-control mean contrasts -- an approach that is natural and widely used. We show that, in stratified clustered trials with heterogeneous cluster sizes, such estimators need not be consistent for the SATE. They can converge to the wrong limit even under correct randomization and without model misspecification. The source is a covariance between cluster sizes and treatment effects: stratumwise averaging mis-weights clusters in a way that produces bias of constant order, regardless of sample size. We study the Hájek (ratio) estimator as a robust alternative. By aggregating outcomes within treatment groups before taking their difference, it remains consistent in clustered trials that grow by increasing strata sizes or the number of strata. Despite that, its use in design-based analyses of clustered trials has been limited by the lack of variance estimators. We develop a design-based variance estimator that applies to any number of strata of any size, and show that it is asymptotically conservative, a property that holds even when some strata contain only a single treated or control unit. We also present tests improving the coverage of Wald tests when the number of clusters is moderate. The framework extends naturally to covariate-adjusted estimators via a variance orthogonality property.
翻译:集群随机对照试验常采用分层或配对设计以提高协变量平衡性和效率。样本平均处理效应通常通过平均各层的处理组-对照组均值差异进行估计——这是一种自然且广泛使用的方法。我们证明,在具有异质簇大小的分层集群试验中,此类估计量未必对样本平均处理效应具有一致性。即使在正确随机化且无模型错误设定的情况下,它们也可能收敛到错误极限。其根源在于簇大小与处理效应之间的协方差:逐层平均会以产生恒定阶偏差的方式错误加权簇,且该偏差与样本量无关。我们研究Hájek(比率)估计量作为稳健替代方案。通过先聚合处理组内的结果再取差值,该估计量在通过增加层内规模或分层数量扩展的集群试验中仍保持一致性。尽管如此,其在集群试验基于设计的分析中的应用因缺乏方差估计量而受限。我们开发了一种适用于任意数量及规模分层的设计基方差估计量,并证明其具有渐近保守性——即使某些层仅包含单个处理或对照单元,该性质依然成立。我们还提出了在簇数量适中时改进Wald检验覆盖率的检验方法。该框架通过方差正交性性质自然扩展到协变量调整估计量。