This paper studies inference in two-stage randomized experiments under covariate-adaptive randomization. In the initial stage of this experimental design, clusters (e.g., households, schools, or graph partitions) are stratified and randomly assigned to control or treatment groups based on cluster-level covariates. Subsequently, an independent second-stage design is carried out, wherein units within each treated cluster are further stratified and randomly assigned to either control or treatment groups, based on individual-level covariates. Under the homogeneous partial interference assumption, I establish conditions under which the proposed difference-in-"average of averages" estimators are consistent and asymptotically normal for the corresponding average primary and spillover effects and develop consistent estimators of their asymptotic variances. Combining these results establishes the asymptotic validity of tests based on these estimators. My findings suggest that ignoring covariate information in the design stage can result in efficiency loss, and commonly used inference methods that ignore or improperly use covariate information can lead to either conservative or invalid inference. Finally, I apply these results to studying optimal use of covariate information under covariate-adaptive randomization in large samples, and demonstrate that a specific generalized matched-pair design achieves minimum asymptotic variance for each proposed estimator. The practical relevance of the theoretical results is illustrated through a simulation study and an empirical application.
翻译:本文研究在协变量自适应随机化下两阶段随机化实验的推断问题。在该实验设计的初始阶段,聚类(如家庭、学校或图分区)基于聚类层级协变量进行分层,并随机分配到对照组或处理组。随后,进行独立的第二阶段设计,其中每个已处理聚类内的个体基于个体层级协变量进一步分层,并随机分配到对照组或处理组。在同质部分干扰假设下,本文建立了所提出的“均值之差的均值”估计量对于相应的平均主效应和溢出效应具有一致性和渐近正态性的条件,并开发了其渐近方差的一致估计量。结合这些结果,确立了基于这些估计量的检验的渐近有效性。研究结果表明,在设计阶段忽略协变量信息可能导致效率损失,而常见忽略或不正确使用协变量信息的推断方法会导致保守或无效的推断。最后,本文将这些结果应用于研究大规模样本下协变量自适应随机化中协变量的最优使用,并证明特定的广义匹配配对设计能使每个提出的估计量达到最小渐近方差。通过模拟研究和实际应用案例,本文阐明了理论结果的实践相关性。