Online controlled experiments (A/B testing) are essential in data-driven decision-making for many companies. Increasing the sensitivity of these experiments, particularly with a fixed sample size, relies on reducing the variance of the estimator for the average treatment effect (ATE). Existing methods like CUPED and CUPAC use pre-experiment data to reduce variance, but their effectiveness depends on the correlation between the pre-experiment data and the outcome. In contrast, in-experiment data is often more strongly correlated with the outcome and thus more informative. In this paper, we introduce a novel method that combines both pre-experiment and in-experiment data to achieve greater variance reduction than CUPED and CUPAC, without introducing bias or additional computation complexity. We also establish asymptotic theory and provide consistent variance estimators for our method. Applying this method to multiple online experiments at Etsy, we reach substantial variance reduction over CUPAC with the inclusion of only a few in-experiment covariates. These results highlight the potential of our approach to significantly improve experiment sensitivity and accelerate decision-making.
翻译:在线对照实验(A/B测试)是许多企业进行数据驱动决策的关键工具。在固定样本量的条件下,提升此类实验的灵敏度主要依赖于降低平均处理效应(ATE)估计量的方差。现有方法如CUPED和CUPAC利用实验前数据来缩减方差,但其效果取决于实验前数据与结果变量之间的相关性。相比之下,实验中数据通常与结果变量具有更强的相关性,因而信息量更大。本文提出一种新方法,通过同时结合实验前与实验中的数据,在保证无偏性且不增加计算复杂度的前提下,实现了比CUPED和CUPAC更显著的方差缩减。我们建立了该方法的渐近理论,并给出了其方差的一致估计量。在Etsy平台的多个在线实验中应用本方法,仅引入少量实验中协变量即可实现相较于CUPAC的显著方差降低。这些结果表明,我们的方法能有效提升实验灵敏度并加速决策进程。