This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the cluster; by non-ignorable cluster sizes we mean that "large'' clusters and "small'' clusters may be heterogeneous, and, in particular, the effects of the treatment may vary across clusters of differing sizes. In order to permit this sort of flexibility, we consider a sampling framework in which cluster sizes themselves are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using a covariate-adaptive stratified randomization procedure. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study and empirical demonstration show the practical relevance of our theoretical results.
翻译:本文研究了在聚类规模非随机的情况下,聚类随机实验中的推断问题。这里,聚类随机实验指处理分配在聚类层面进行;而非随机聚类规模意味着“大”聚类与“小”聚类可能存在异质性,特别是处理效应可能因聚类规模不同而变化。为允许这种灵活性,我们考虑一个将聚类规模本身视为随机变量的抽样框架。由此,我们的分析不同于以往将聚类规模视为非随机变量的聚类随机实验研究。我们区分了两个不同的关注参数:等权重聚类层面平均处理效应与规模加权聚类层面平均处理效应。针对每个参数,我们在聚类数量趋于无穷且采用协变量自适应分层随机化程序分配处理的渐近框架下,提供了推断方法。此外,我们允许实验者仅从每个聚类中抽取部分单元而非整个聚类,并展示了此类抽样对若干常用估计量的影响。通过一个小型模拟研究和实证演示,说明了我们理论结果的实际意义。