Given n experiment subjects with potentially heterogeneous covariates and two possible treatments, namely active treatment and control, this paper addresses the fundamental question of determining the optimal accuracy in estimating the treatment effect. Furthermore, we propose an experimental design that approaches this optimal accuracy, giving a (non-asymptotic) answer to this fundamental yet still open question. The methodological contribution is listed as following. First, we establish an idealized optimal estimator with minimal variance as benchmark, and then demonstrate that adaptive experiment is necessary to achieve near-optimal estimation accuracy. Secondly, by incorporating the concept of doubly robust method into sequential experimental design, we frame the optimal estimation problem as an online bandit learning problem, bridging the two fields of statistical estimation and bandit learning. Using tools and ideas from both bandit algorithm design and adaptive statistical estimation, we propose a general low switching adaptive experiment framework, which could be a generic research paradigm for a wide range of adaptive experimental design. Through novel lower bound techniques for non-i.i.d. data, we demonstrate the optimality of our proposed experiment. Numerical result indicates that the estimation accuracy approaches optimal with as few as two or three policy updates.
翻译:给定n个实验对象,其协变量可能存在异质性,并有两种可能的处理方式(即主动处理和对照),本文探讨了确定处理效应估计最优精度的基本问题。此外,我们提出了一种实验设计,能够逼近这一最优精度,从而为这一基础但尚未解决的问题提供了一个(非渐近的)答案。方法论贡献如下:首先,我们建立了一个具有最小方差的理想化最优估计器作为基准,进而证明自适应实验是实现接近最优估计精度所必需的。其次,通过将双重稳健方法的概念融入序贯实验设计,我们将最优估计问题构建为一个在线赌博机学习问题,从而连接了统计估计与赌博机学习这两个领域。利用来自赌博机算法设计和自适应统计估计的工具与思想,我们提出了一个通用的低切换自适应实验框架,该框架可作为广泛自适应实验设计的通用研究范式。通过针对非独立同分布数据的新颖下界技术,我们证明了所提出实验的最优性。数值结果表明,仅需两到三次策略更新,估计精度即可接近最优。