We describe how to calculate standard errors for A/B tests that include clustered data, ratio metrics, and/or covariate adjustment. We may do this for power analysis/sample size calculations prior to running an experiment using historical data, or after an experiment for hypothesis testing and confidence intervals. The different applications have a common framework, using the sample variance of certain residuals. The framework is compatible with modular software, can be plugged into standard tools, doesn't require computing covariance matrices, and is numerically stable. Using this approach we estimate that covariate adjustment gives a median 66% variance reduction for a key metric, reducing experiment run time by 66%.
翻译:本文阐述了如何计算包含聚类数据、比率指标和/或协变量调整的A/B测试标准误。我们可在实验前基于历史数据进行功效分析/样本量计算,或在实验后进行假设检验与置信区间估计时应用此方法。这些不同应用场景共享统一框架——通过计算特定残差的样本方差实现。该框架兼容模块化软件,可集成至标准工具链,无需计算协方差矩阵且保持数值稳定性。应用本方法估算发现,协变量调整可使关键指标方差中位数降低66%,从而将实验运行时间缩短66%。