Online controlled experiments (A/B testing) are fundamental to data-driven decision-making in many companies. Improving the sensitivity of these experiments under fixed sample size constraints requires reducing the variance of the average treatment effect (ATE) estimator. Existing variance reduction techniques such as CUPED and CUPAC use pre-experiment data, but their effectiveness depends on how predictive those data are for outcomes measured during the experiment. In-experiment data are often more strongly correlated with the outcome, but using arbitrary post-treatment variables can introduce bias. In this paper, we propose a general, robust, and scalable framework that combines both pre-experiment and in-experiment data to achieve variance reduction. Our framework is simple, interpretable, and computationally efficient, making it practical for real-world deployment. We develop the asymptotic theory of the proposed estimator and provide consistent variance estimators. Empirical results from multiple online experiments conducted at Etsy demonstrate substantial additional variance reduction over current pipeline, even when incorporating only a few post-treatment covariates. These findings underscore the effectiveness of our framework in improving experimental sensitivity and accelerating data-driven decision-making.
翻译:在线控制实验(A/B测试)是许多公司进行数据驱动决策的基础。在固定样本量约束下提升实验灵敏度,需要降低平均处理效应(ATE)估计量的方差。现有方差缩减技术(如CUPED和CUPAC)虽使用实验前数据,但其有效性取决于这些数据对实验期间测量结果的预测能力。实验内数据通常与结果变量相关性更强,但滥用实验后变量可能引入偏差。本文提出一种通用、稳健且可扩展的框架,通过融合实验前与实验内数据实现方差缩减。该框架简洁直观、计算高效,适用于实际部署。我们建立了所提估计量的渐近理论,并给出一致方差估计量。基于Etsy多个在线实验的实证结果表明,即使仅纳入少数实验后协变量,也能在现有流程基础上实现显著的额外方差缩减。这些发现充分验证了本框架在提升实验灵敏度、加速数据驱动决策方面的有效性。