In the experimental design literature, Neyman allocation refers to the practice of allocating units into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multi-stage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multi-stage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the information-theoretic limit of conducting experiments. We provide theory for estimation and inference using data collected from our adaptive Neyman allocation algorithm. We demonstrate the effectiveness of our adaptive Neyman allocation algorithm using both online A/B testing data from a social media site and synthetic data.
翻译:在实验设计文献中,Neyman分配指的是将实验单元分配到处理组和对照组的方法,该方法允许两组分配数量不等,且分配比例与各自的标准差成比例,其目标是最小化处理效应估计量的方差。这一广为人知的方法在处理组和对照组具有不同标准差的场景中能提升统计功效,此类场景常见于社会实验、临床试验、市场研究和在线A/B测试。然而,除非标准差事先已知,否则无法实施Neyman分配。幸运的是,上述应用的多阶段特性允许利用早期阶段的观测值来估计标准差,进而指导后续阶段的分配决策。本文引入竞争性分析框架来研究这一多阶段实验设计问题。我们提出了一种简单的自适应Neyman分配算法,该算法几乎达到了进行实验的信息理论极限。我们为使用自适应Neyman分配算法收集的数据提供了估计与推断的理论依据。通过社交媒体网站的在线A/B测试数据和合成数据,我们验证了自适应Neyman分配算法的有效性。