Inference for Two-stage Experiments under Covariate-Adaptive Randomization

This paper studies inference in two-stage randomized experiments under covariate-adaptive randomization. In the initial stage of this experimental design, clusters (e.g., households, schools, or graph partitions) are stratified and randomly assigned to control or treatment groups based on cluster-level covariates. Subsequently, an independent second-stage design is carried out, wherein units within each treated cluster are further stratified and randomly assigned to either control or treatment groups, based on individual-level covariates. Under the homogeneous partial interference assumption, I establish conditions under which the proposed difference-in-``average of averages'' estimators are consistent and asymptotically normal for the corresponding average primary and spillover effects and develop consistent estimators of their asymptotic variances. Combining these results establishes the asymptotic validity of tests based on these estimators. My findings suggest that ignoring covariate information in the design stage can result in efficiency loss, and commonly used inference methods that ignore or improperly use covariate information can lead to either conservative or invalid inference. Then, I apply these results to studying optimal use of covariate information under covariate-adaptive randomization in large samples, and demonstrate that a specific generalized matched-pair design achieves minimum asymptotic variance for each proposed estimator. Finally, I discuss covariate adjustment, which incorporates additional baseline covariates not used for treatment assignment. The practical relevance of the theoretical results is illustrated through a simulation study and an empirical application.

翻译：本文研究协变量自适应随机化下两阶段随机实验的推断问题。在此实验设计的初始阶段，根据聚类层面的协变量对聚类（如家庭、学校或图划分）进行分层，并将其随机分配至控制组或处理组。随后实施独立的第二阶段设计：在每个处理聚类内，基于个体层面的协变量对单元进一步分层，并随机分配至控制组或处理组。在同质部分干扰假设下，本文建立了使所提出的“均值之均值”差分估计量对相应平均主效应与溢出效应具有一致性与渐近正态性的条件，并构建了其渐近方差的一致估计量。结合这些结果可证得基于这些估计量的检验具有渐近有效性。研究结果表明：在设计阶段忽略协变量信息可能导致效率损失；而忽略或不当使用协变量信息的常用推断方法可能产生保守或无效的推断。进而，我将这些结果应用于大样本下协变量自适应随机化中协变量信息的最优利用研究，证明特定的广义配对设计能使每个所提估计量达到最小渐近方差。最后，本文讨论了协变量调整方法——该方法纳入了未用于治疗分配的其他基线协变量。通过模拟研究与实证应用，阐明了理论结果的实际意义。