This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By non-ignorable cluster sizes, we refer to the possibility that the treatment effects may depend non-trivially on the cluster sizes. We frame our analysis in a super-population framework in which cluster sizes are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using a covariate-adaptive stratified randomization procedure. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study and empirical demonstration show the practical relevance of our theoretical results.
翻译:本文研究当聚类规模不可忽略时,聚类随机实验中的推断问题。此处,聚类随机实验指在聚类层面分配处理的实验。不可忽略的聚类规模意味着处理效应可能显著依赖于聚类规模。我们在超总体框架下进行分析,其中聚类规模被视为随机变量。这一分析路径区别于早期将聚类规模视为非随机变量的聚类随机实验研究。我们区分了两种不同关注参数:等权聚类级平均处理效应与规模加权聚类级平均处理效应。针对每个参数,我们提供了渐近框架下的推断方法,该框架中聚类数量趋于无穷,且处理采用协变量自适应分层随机化程序分配。此外,我们允许实验者仅对每个聚类内部分单元而非全部单元进行抽样,并论证此类抽样对若干常用估计量的影响。一个小型模拟研究与实证展示验证了我们理论结果的实际相关性。