Practitioners and academics have long appreciated the benefits of covariate balancing when they conduct randomized experiments. For web-facing firms running online A/B tests, however, it still remains challenging in balancing covariate information when experimental subjects arrive sequentially. In this paper, we study an online experimental design problem, which we refer to as the "Online Blocking Problem." In this problem, experimental subjects with heterogeneous covariate information arrive sequentially and must be immediately assigned into either the control or the treated group. The objective is to minimize the total discrepancy, which is defined as the minimum weight perfect matching between the two groups. To solve this problem, we propose a randomized design of experiment, which we refer to as the "Pigeonhole Design." The pigeonhole design first partitions the covariate space into smaller spaces, which we refer to as pigeonholes, and then, when the experimental subjects arrive at each pigeonhole, balances the number of control and treated subjects for each pigeonhole. We analyze the theoretical performance of the pigeonhole design and show its effectiveness by comparing against two well-known benchmark designs: the match-pair design and the completely randomized design. We identify scenarios when the pigeonhole design demonstrates more benefits over the benchmark design. To conclude, we conduct extensive simulations using Yahoo! data to show a 10.2% reduction in variance if we use the pigeonhole design to estimate the average treatment effect.
翻译:从业者和学者长期以来一直认可协变量均衡在随机实验中的优势。然而,对于进行在线A/B测试的网络企业而言,当实验对象序贯到达时,均衡协变量信息仍具挑战性。本文研究一类被称为"在线分块问题"的在线实验设计问题。在该问题中,携带异质协变量信息的实验对象序贯到达,且必须立即分配到对照组或处理组。目标是最小化总差异度,该差异度定义为两组间最小权完美匹配。为解决该问题,我们提出一种称为"鸽巢设计"的随机化实验设计方法。鸽巢设计首先将协变量空间划分为更小的子空间(称为鸽巢),随后当实验对象到达每个鸽巢时,均衡各鸽巢中对照组与处理组的数量。我们分析了鸽巢设计的理论性能,并通过与匹配对设计及完全随机设计两种经典基准设计的对比验证其有效性。我们识别出鸽巢设计相较于基准设计更具优势的场景。最后,利用雅虎数据进行大量模拟实验,结果表明采用鸽巢设计估计平均处理效应时方差可降低10.2%。