AB testing evaluates the difference between a control and a treatment in a statistically rigorous manner. Continuous monitoring allows statistical evaluation of an AB test as it proceeds. One goal of continuous monitoring is early stopping -- confirming a statistically significant difference between control and treatment as soon as possible. Another goal is to maintain some statistical capability to discover significant differences later in the test if they cannot be confirmed earlier. These goals are in conflict -- looser requirements for early stopping leave us with more stringent ones for later. This paper shows that it is impossible to maintain a constant requirement for significance for tests that have no a priori stopping time, but we can come arbitrarily close to that goal by using tests that require repeated significant results to con rm statistically significant differences between treatment and control.
翻译:AB测试以统计严谨的方式评估对照组与处理组之间的差异。连续监测允许在AB测试进行过程中进行统计评估。连续监测的一个目标是早期停止——即尽可能早地确认对照组与处理组之间存在统计显著差异。另一个目标是若早期无法确认差异,则保持一定的统计能力以在测试后期发现显著差异。这两个目标存在冲突——对早期停止的要求越宽松,后期所需的条件就越严格。本文证明,对于没有先验停止时间的检验,不可能维持恒定的显著性要求,但通过使用需要重复显著结果来确认处理组与对照组之间统计显著差异的检验方法,我们可以无限接近该目标。