Cluster-randomized trials often involve units that are irregularly distributed in space without well-separated communities. In these settings, cluster construction is a critical aspect of the design due to the potential for cross-cluster interference. The existing literature relies on partial interference models, which take clusters as given and assume no cross-cluster interference. We relax this assumption by allowing interference to decay with geographic distance between units. This induces a bias-variance trade-off: constructing fewer, larger clusters reduces bias due to interference but increases variance. We propose new estimators that exclude units most potentially impacted by cross-cluster interference and show that this substantially reduces asymptotic bias relative to conventional difference-in-means estimators. We then study the design of clusters to optimize the estimators' rates of convergence. We provide formal justification for a new design that chooses the number of clusters to balance the asymptotic bias and variance of our estimators and uses unsupervised learning to automate cluster construction.
翻译:整群随机试验常涉及空间分布不规律且缺乏明确分隔社区的试验单位。在此类情境下,由于可能存在跨组群干扰,组群构建成为试验设计中的关键环节。现有文献多依赖局部干扰模型,以组群为既定前提并假设不存在跨组群干扰。本研究放宽该假设,允许干扰随单位间地理距离衰减。由此产生偏差-方差权衡:构建更少但更大的组群可削减干扰带来的偏差,却会增大方差。我们提出新估计量,通过剔除最可能受跨组群干扰影响的单位,证明其渐近偏差相较于传统差分均值估计量显著降低。随后研究旨在优化估计量收敛速度的组群设计策略。我们为一种新设计提供了形式化理论依据:通过平衡所提估计量的渐近偏差与方差来确定组群数量,并借助无监督学习实现组群构建自动化。