Practitioners and academics have long appreciated the benefits that experimentation brings to firms. For web-facing firms running online A/B tests, however, it still remains challenging in balancing covariate information when experimental subjects arrive sequentially. In this paper, we study a novel online experimental design problem, which we refer to as the "Online Blocking Problem." In this problem, experimental subjects with heterogeneous covariate information arrive sequentially and must be immediately assigned into either the control or the treatment group, with an objective of minimizing the total discrepancy, which is defined as the minimum weight perfect matching between the two groups. To solve this problem, we propose a novel experimental design approach, which we refer to as the "Pigeonhole Design." The pigeonhole design first partitions the covariate space into smaller spaces, which we refer to as pigeonholes, and then, when the experimental subjects arrive at each pigeonhole, balances the number of control and treatment subjects for each pigeonhole. We analyze the theoretical performance of the pigeonhole design and show its effectiveness by comparing against two well-known benchmark designs: the match-pair design and the completely randomized design. We identify scenarios when the pigeonhole design demonstrates more benefits over the benchmark design. To conclude, we conduct extensive simulations using Yahoo! data to show a 10.2% reduction in variance if we use the pigeonhole design to estimate the average treatment effect.
翻译:实践者与学者长期以来认识到实验为企业带来的益处。然而,对于运行在线A/B测试的互联网企业而言,当实验对象按顺序到达时,如何平衡协变量信息仍然具有挑战性。本文研究了一种新颖的在线实验设计问题,我们称之为"在线分组问题"。在该问题中,携带异质协变量信息的实验对象按顺序到达,必须立即分配到对照组或处理组,目标是使总差异最小化——该差异定义为两组之间的最小权重完美匹配。为解决该问题,我们提出了一种新的实验设计方法,称之为"鸽巢设计"。鸽巢设计首先将协变量空间划分为更小的子空间(称为鸽巢),随后当实验对象到达每个鸽巢时,平衡该鸽巢中对照组与处理组的样本数量。我们分析了鸽巢设计的理论性能,并通过与两种著名的基准设计(匹配对设计和完全随机化设计)进行比较,验证了其有效性。我们识别出鸽巢设计相较于基准设计具有更大优势的情景。最终,我们使用雅虎数据进行大量仿真实验,结果表明采用鸽巢设计估计平均处理效应时方差可降低10.2%。