Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference

Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.

翻译：在网络干扰存在的情况下，因果推断的方差缩减通常通过两种方式实现：一是结果建模（通常针对单元随机伯努利设计进行分析），二是聚类实验设计（通常在不强加参数假设的条件下进行分析）。本文研究这两种方法的交叉点，并探讨在一般实验设计中使用低阶结果模型进行估计的问题。我们的贡献有三方面。首先，针对一般实验设计下收集的数据，我们提出了一种低度结果模型中的总处理效应（也称为全局平均处理效应）估计量，推广了先前针对伯努利设计的结果。我们将此估计量称为伪逆估计量，并依据实验设计的性质给出了其偏差和方差的界限。其次，针对伯努利和完全随机化两种聚类随机设计情形，我们评估了这些界限。对于聚类伯努利随机化，我们发现该估计量始终无偏，且其方差规模相当于低阶假设下方差与聚类随机化方差的较小者，表明结合这些方差缩减策略优于单独使用任一策略。对于聚类完全随机化，我们发现存在由聚类具体特征介导的显著偏差-方差权衡。第三，在选择聚类实验设计时，我们的界限可用于从候选聚类集合中选择一个聚类。在一系列图和聚类算法上，我们证明该方法始终能选择出在多种响应模型上表现良好的聚类，表明我们的界限对实践者具有实用价值。