AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stopping, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry.
翻译:AB测试辅助业务运营者进行决策,被视为通过数据学习以改善数字用户体验的金标准方法。然而,实践者的需求与常用于AB测试分析的统计假设检验方法所施加的约束之间通常存在差距。这些约束包括多因素多元设计中缺乏统计功效、因素间的相关性、因需要提前停止而需进行序贯检验,以及无法汇集过往测试的知识。在此,我们提出一种应用分层贝叶斯估计的解决方案以应对上述局限性。与当前序贯AB测试方法相比,我们通过利用因素间的相关性提高统计功效,实现序贯检验与渐进式提前停止,同时避免过高的假阳性风险。我们还展示了如何扩展该方法,以从过往AB测试中提取复合全局学习成果,从而加速未来测试。我们以坚实的理论框架支撑本研究,该框架阐明了分层估计的价值。我们通过数值模拟和大量真实世界AB测试验证其效用。这些结果共同凸显了我们的方法在技术行业进行统计推断方面的实用价值。