We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation of the average treatment effect (\textsf{ATE}) in causal inference. While a certain class of augmented inverse propensity weighting (\textsf{AIPW}) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (\romannumeral 1) the tabular case; (\romannumeral 2) the case of linear function approximation; and (\romannumeral 3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the \textsf{AIPW} estimator using no-regret online learning algorithms.
翻译:本文研究利用自适应收集的数据估计处理效应的线性泛函问题。该任务在多种场景中具有重要应用,包括上下文赌博机中的离策略评估(OPE)以及因果推断中的平均处理效应(ATE)估计。虽然某类增强逆倾向加权(AIPW)估计量具备半参有效性等理想的渐近性质,但其在自适应收集数据下的非渐近理论尚未得到充分研究。为填补这一空白,我们首先建立了AIPW估计量类均方误差的通用上界,该上界关键取决于处理效应与其估计值之间的序贯加权误差。受此启发,我们进一步提出一种通用约简框架,通过在线学习生成处理效应的估计序列以最小化序贯加权估计误差。为具体说明,我们给出三个实例:(Ⅰ)表格型场景;(Ⅱ)线性函数逼近场景;(Ⅲ)结果模型通用函数逼近场景。最后,我们通过建立局部极小极大下界,证明了采用无悔在线学习算法的AIPW估计量具有实例依赖性最优性。