Just Ramp-up: Unleash the Potential of Regression-based Estimator for A/B Tests under Network Interference

Recent research in causal inference under network interference has explored various experimental designs and estimation techniques to address this issue. However, existing methods, which typically rely on single experiments, often reach a performance bottleneck and face limitations in handling diverse interference structures. In contrast, we propose leveraging multiple experiments to overcome these limitations. In industry, the use of sequential experiments, often known as the ramp-up process, where traffic to the treatment gradually increases, is common due to operational needs like risk management and cost control. Our approach shifts the focus from operational aspects to the statistical advantages of merging data from multiple experiments. By combining data from sequentially conducted experiments, we aim to estimate the global average treatment effect more effectively. In this paper, we begin by analyzing the bias and variance of the linear regression estimator for GATE under general linear network interference. We demonstrate that bias plays a dominant role in the bias-variance tradeoff and highlight the intrinsic bias reduction achieved by merging data from experiments with strictly different treatment proportions. Herein the improvement introduced by merging two steps of experimental data is essential. In addition, we show that merging more steps of experimental data is unnecessary under general linear interference, while it can become beneficial when nonlinear interference occurs. Furthermore, we look into a more advanced estimator based on graph neural networks. Through extensive simulation studies, we show that the regression-based estimator benefits remarkably from training on merged experiment data, achieving outstanding statistical performance.

翻译：近年来，针对网络干扰下的因果推断研究已探索了多种实验设计与估计技术以应对此问题。然而，现有方法通常依赖单一实验，往往达到性能瓶颈，且在处理多样化干扰结构时面临局限。相比之下，我们提出利用多重实验来突破这些限制。在工业实践中，由于风险管理和成本控制等运营需求，逐步增加处理组流量的序列化实验（通常称为渐进式提升过程）已被广泛采用。我们的方法将关注点从运营层面转向合并多重实验数据所带来的统计优势。通过整合序列化实验的数据，我们旨在更有效地估计全局平均处理效应。本文首先分析了通用线性网络干扰下线性回归估计器对GATE的偏差与方差。我们证明在偏差-方差权衡中偏差起主导作用，并强调了通过合并处理比例严格不同的实验数据可实现内在的偏差降低——其中合并两阶段实验数据带来的改进至关重要。此外，我们证明在通用线性干扰下合并更多阶段实验数据并无必要，而当非线性干扰出现时则可能产生增益。进一步地，我们研究了基于图神经网络的更先进估计器。通过大量模拟研究，我们表明基于回归的估计器通过训练合并的实验数据能获得显著提升，实现卓越的统计性能。