Randomized controlled trials often suffer from interference, a violation of the Stable Unit Treatment Values Assumption (SUTVA) in which a unit's treatment assignment affects the outcomes of its neighbors. This interference causes bias in naive estimators of the average treatment effect (ATE). A popular method to achieve unbiasedness is to pair the Horvitz-Thompson estimator of the ATE with a known exposure mapping: a function that identifies which units in a given randomization are not subject to interference. For example, an exposure mapping can specify that any unit with at least $h$-fraction of its neighbors having the same treatment status does not experience interference. However, this threshold $h$ is difficult to elicit from domain experts, and a misspecified threshold can induce bias. In this work, we propose a data-adaptive method to select the "$h$"-fraction threshold that minimizes the mean squared error of the Hortvitz-Thompson estimator. Our method estimates the bias and variance of the Horvitz-Thompson estimator under different thresholds using a linear dose-response model of the potential outcomes. We present simulations illustrating that our method improves upon non-adaptive choices of the threshold. We further illustrate the performance of our estimator by running experiments on a publicly-available Amazon product similarity graph. Furthermore, we demonstrate that our method is robust to deviations from the linear potential outcomes model.
翻译:随机对照试验常受到干扰的影响,这违背了稳定单元处理值假设(SUTVA),即一个单元的处理分配会影响其邻居单元的结果。这种干扰会导致平均处理效应(ATE)的朴素估计量产生偏差。实现无偏性的一种常用方法是将ATE的Horvitz-Thompson估计量与已知的暴露映射配对:该函数能识别给定随机化中哪些单元不受干扰影响。例如,暴露映射可以规定:任何拥有至少$h$比例邻居处于相同处理状态的单元不会经历干扰。然而,该阈值$h$难以从领域专家处获取,且错误设定的阈值可能引入偏差。本研究提出一种数据自适应方法来选择使Horvitz-Thompson估计量均方误差最小化的"$h$"比例阈值。我们的方法通过潜在结果的线性剂量-响应模型,估计不同阈值下Horvitz-Thompson估计量的偏差与方差。仿真实验表明,该方法相较于非自适应的阈值选择方式有所改进。我们通过在公开可用的亚马逊产品相似性图上进行实验,进一步说明了该估计器的性能。此外,我们证明了该方法对线性潜在结果模型的偏离具有鲁棒性。