Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference

Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.

翻译：在网络干扰存在的情况下，因果推断的方差缩减通常通过两种方式实现：一是结果建模（通常在单元随机伯努利设计下进行分析），二是聚类实验设计（通常在不依赖强参数假设的情况下进行分析）。本研究探讨了这两种方法的交集，并考虑在一般实验设计下使用低阶结果模型进行估计的问题。我们的贡献有三点。首先，我们提出了一种在一般实验设计下收集数据时，用于低阶结果模型中总处理效应（也称为全局平均处理效应）的估计量。该估计量推广了先前针对伯努利设计的结果，我们称之为伪逆估计量，并基于实验设计的性质给出了其偏差与方差的界限。其次，我们针对包含伯努利随机化和完全随机化的聚类随机设计案例评估了这些界限。对于聚类伯努利随机化，我们发现该估计量始终无偏，且其方差缩放为低阶假设下的方差与聚类随机化下的方差两者中的较小者，这表明结合这两种方差缩减策略优于单独使用其中任何一种。对于聚类完全随机化，我们观察到聚类特定特征介导的显著偏差-方差权衡。第三，在选择聚类实验设计时，我们的界限可用于从一组候选聚类中选择一个聚类方案。通过一系列图结构与聚类算法，我们证明该方法能够一致地选择出在多种响应模型上表现良好的聚类方案，这表明我们的界限对实践者具有实用价值。