This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the cluster; by non-ignorable cluster sizes we mean that "large'' clusters and "small'' clusters may be heterogeneous, and, in particular, the effects of the treatment may vary across clusters of differing sizes. In order to permit this sort of flexibility, we consider a sampling framework in which cluster sizes themselves are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using a covariate-adaptive stratified randomization procedure. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study and empirical demonstration show the practical relevance of our theoretical results.
翻译:本文探讨了在集群规模不可忽略的集群随机实验中开展推断的问题。此处,集群随机实验指在集群层面分配处理;而不可忽略的集群规模意味着“大”集群与“小”集群可能存在异质性,尤其在不同规模的集群中,处理效应可能有所差异。为允许这种灵活性,我们考虑一个将集群规模本身视为随机的抽样框架。由此,我们的分析不同于以往将集群规模视为非随机变量的集群随机实验研究。我们区分了两个不同的关注参数:均等权重的集群层面平均处理效应与规模加权的集群层面平均处理效应。针对每个参数,我们在集群数量趋于无穷且采用协变量自适应分层随机化程序分配处理的渐近框架下,提供了推断方法。此外,我们允许实验者仅从每个集群中抽取部分单元而非整个集群,并展示了这种抽样对若干常用估计量的实际影响。通过一项小型模拟研究及实证演示,验证了理论结果的实践相关性。