Regression adjustment, sometimes known as Controlled-experiment Using Pre-Experiment Data (CUPED), is an important technique in internet experimentation. It decreases the variance of effect size estimates, often cutting confidence interval widths in half or more while never making them worse. It does so by carefully regressing the goal metric against pre-experiment features to reduce the variance. The tremendous gains of regression adjustment begs the question: How much better can we do by engineering better features from pre-experiment data, for example by using machine learning techniques or synthetic controls? Could we even reduce the variance in our effect sizes arbitrarily close to zero with the right predictors? Unfortunately, our answer is negative. A simple form of regression adjustment, which uses just the pre-experiment values of the goal metric, captures most of the benefit. Specifically, under a mild assumption that observations closer in time are easier to predict that ones further away in time, we upper bound the potential gains of more sophisticated feature engineering, with respect to the gains of this simple form of regression adjustment. The maximum reduction in variance is $50\%$ in Theorem 1, or equivalently, the confidence interval width can be reduced by at most an additional $29\%$.
翻译:回归调整(又称“利用实验前数据的对照实验”,简称CUPED)是互联网实验领域的一项重要技术。它通过将目标指标对实验前特征进行精细回归以降低方差,从而减小效应量估计的方差,通常能将置信区间宽度缩减一半以上,且不会使其变差。回归调整带来的巨大增益引发了一个问题:通过利用实验前数据设计更好的特征(例如采用机器学习技术或合成对照方法),我们能在多大程度上进一步提升效果?是否有可能借助适当的预测因子将效应量的方差任意趋近于零?遗憾的是,答案是否定的。一种仅使用目标指标实验前观测值的简单回归调整形式,已能捕获大部分增益。具体而言,在“时间上更接近的观测值比时间上更远的观测值更易预测”这一温和假设下,我们证明了:相较于这种简单回归调整形式,更精细的特征工程所能带来的潜在增益存在上界。定理1表明,方差的最大缩减幅度为50%,等价地,置信区间宽度最多可额外缩减29%。