A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentially assigned over time. These strategies are designed to minimize the variance of the treatment effect estimator when data follow a non-Markov decision process or a (time-varying) Markov decision process. We further develop estimation procedures based on existing off-policy evaluation (OPE) methods and conduct extensive experiments in various environments to demonstrate the effectiveness of the proposed methodologies. In theory, we prove the optimality of the proposed treatment allocation design and establish upper bounds for the mean squared errors of the resulting treatment effect estimators.
翻译:A/B测试对于现代科技公司评估新产品相对于标准基线的有效性至关重要。本文研究旨在最大化从在线实验中获取信息量的最优设计,以准确估计处理效应。我们提出了在动态环境下的三种最优分配策略,其中处理措施随时间序贯分配。这些策略旨在当数据服从非马尔可夫决策过程或(时变)马尔可夫决策过程时,最小化处理效应估计量的方差。我们进一步基于现有离线策略评估(OPE)方法开发了估计流程,并在多种环境中开展广泛实验以证明所提方法的有效性。在理论上,我们证明了所提处理分配设计的最优性,并建立了相应处理效应估计量均方误差的上界。