Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference

Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.

翻译：在网络干扰存在的情况下，因果推断的方差缩减通常通过两种方式实现：结果建模（通常在单元随机化的伯努利设计下进行分析）或聚类实验设计（通常在无强参数假设下进行分析）。本研究探讨了这两种方法的交叉点，并考虑了使用一般实验设计数据对低阶结果模型进行估计的问题。我们的贡献包括三个方面。首先，我们提出了一种在低阶结果模型中估计总处理效应（也称为全局平均处理效应）的方法，该方法适用于在一般实验设计下收集的数据，从而推广了先前针对伯努利设计的结果。我们将此估计量称为伪逆估计量，并根据实验设计的特性给出了其偏差和方差的界限。其次，我们针对采用伯努利随机化和完全随机化的聚类随机设计评估了这些界限。对于聚类伯努利随机化，我们发现该估计量始终无偏，且其方差缩放幅度类似于从低阶假设获得的方差与从聚类随机化获得的方差中较小者，这表明结合这两种方差缩减策略优于单独使用任一策略。对于聚类完全随机化，我们发现由聚类特定特征调节的显著偏差-方差权衡。第三，在选择聚类实验设计时，我们的界限可用于从一组候选聚类中选择聚类方案。通过在一系列图和聚类算法上的实验，我们证明我们的方法能够一致地选择在多种响应模型上表现良好的聚类方案，这表明我们的界限对实践者具有实用价值。