Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference

Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and make the following threefold contributions. First, we present an estimator of the total treatment effect (or global average treatment effect) in low-order outcome models when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of Bernoulli graph cluster randomized (GCR) designs. Its variance scales like the smaller of the variance obtained by the estimator derived under a low-order assumption, and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. When the order of the potential outcomes model is correctly specified, our estimator is always unbiased, and under a misspecified model, we upper bound the bias by the closeness of the ground truth model to a low-order model. Third, we give empirical evidence that our variance bounds can be used to select a good clustering that minimizes the worst-case variance under a cluster randomized design from a set of candidate clusterings. Across a range of graphs and clustering algorithms, our method consistently selects clusterings that perform well on a range of response models, suggesting the practical use of our bounds.

翻译：在网络干扰存在的情况下，因果推断的方差缩减通常通过两种方式实现：要么通过结果建模（通常在单元随机化的伯努利设计下进行分析），要么通过聚类实验设计（通常在没有强参数假设的情况下进行分析）。本研究探讨了这两种方法的交叉点，并作出以下三方面贡献。首先，我们提出了一种在低阶结果模型中估计总处理效应（或全局平均处理效应）的方法，该方法适用于一般实验设计下收集的数据，推广了先前针对伯努利设计的结果。我们将此估计量称为伪逆估计量，并根据实验设计的特性给出了其偏差和方差的界限。其次，我们针对伯努利图聚类随机化设计评估了这些界限。其方差缩放类似于在低阶假设下推导的估计量所获得的方差与通过聚类随机化获得的方差中的较小者，这表明结合这些方差缩减策略优于单独使用任何一种。当潜在结果模型的阶数被正确设定时，我们的估计量始终是无偏的；而在模型设定错误的情况下，我们通过真实模型与低阶模型的接近程度来上界偏差。第三，我们提供了实证证据，表明我们的方差界限可用于从一组候选聚类中选择一种良好的聚类，以最小化聚类随机化设计下的最坏情况方差。在多种图结构和聚类算法中，我们的方法始终能选择在一系列响应模型上表现良好的聚类，这证明了我们提出的界限具有实际应用价值。