Continuous and efficient experimentation is key to the practical success of user-facing applications on the web, both through online A/B-tests and off-policy evaluation. Despite their shared objective -- estimating the incremental value of a treatment -- these domains often operate in isolation, utilising distinct terminologies and statistical toolkits. This paper bridges that divide by establishing a formal equivalence between their canonical variance reduction methods. We prove that the standard online Difference-in-Means estimator is mathematically identical to an off-policy Inverse Propensity Scoring estimator equipped with an optimal (variance-minimising) additive control variate. Extending this unification, we demonstrate that widespread regression adjustment methods (such as CUPED, CUPAC, and ML-RATE) are structurally equivalent to Doubly Robust estimation. This unified view extends our understanding of commonly used approaches, and can guide practitioners and researchers working on either class of problems.
翻译:持续且高效的实验是网络用户应用取得实际成功的关键,既包括在线A/B测试,也包括离线策略评估。尽管它们的目标一致——估计处理的增量价值——这些领域通常各自为政,使用不同的术语和统计工具。本文通过建立它们经典方差缩减方法之间的形式等价性来弥合这一鸿沟。我们证明,标准的在线均值差分估计器在数学上等同于配备了最优(方差最小化)加性控制变量的离线逆倾向评分估计器。扩展这一统一框架,我们证明了广泛使用的回归调整方法(如CUPED、CUPAC和ML-RATE)在结构上等价于双重稳健估计。这种统一视角拓展了我们对常用方法的理解,并能指导从事任一类问题研究的实践者和研究者。