Cluster-randomized trials often involve units that are irregularly distributed in space without well-separated communities. In these settings, cluster construction is a critical aspect of the design due to the potential for cross-cluster interference. The existing literature relies on partial interference models, which take clusters as given and assume no cross-cluster interference. We relax this assumption by allowing interference to decay with geographic distance between units. This induces a bias-variance trade-off: constructing fewer, larger clusters reduces bias due to interference but increases variance. We propose new estimators that exclude units most potentially impacted by cross-cluster interference and show that this substantially reduces asymptotic bias relative to conventional difference-in-means estimators. We provide formal justification for a new design that chooses the number of clusters to balance the asymptotic bias and variance of our estimators and uses unsupervised learning to automate cluster construction.
翻译:随机聚类试验通常涉及空间分布不规则且无明确分隔社区的单位。在这些场景中,由于跨群干扰的可能性,聚类构建成为设计的关键环节。现有文献依赖于部分干扰模型,该模型将聚类视为已知条件并假设不存在跨群干扰。我们通过允许干扰随单位间地理距离衰减来放宽这一假设。这引入了偏差-方差权衡:构建更少但更大的聚类可降低干扰导致的偏差,但会增加方差。我们提出排除最易受跨群干扰影响单位的新估计量,并证明与传统的均值差估计量相比,该方法能显著降低渐近偏差。我们为一种新设计提供了正式依据,该设计通过平衡所提估计量的渐近偏差和方差来选择聚类数量,并采用无监督学习自动化聚类构建过程。