Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "pooling" data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the \textit{adaptive} sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude.
翻译:在线强化学习及其他自适应采样算法日益广泛应用于数字干预实验中,以随时间推移优化面向用户的治疗方案。本研究聚焦于一大类自适应采样算法收集的纵向用户数据,该类算法旨在通过整合多用户累积数据在线优化治疗决策。跨用户数据合并(即"池化")虽能使自适应采样算法实现更快学习,但池化操作会导致用户数据轨迹之间产生依赖关系;研究表明,这种依赖会使得针对独立同分布数据的标准方差估计量低估该类数据常见估计量的真实方差。我们基于Z估计法提出新型统计分析方法,能够对自适应采样数据进行多样化统计分析。具体而言,我们引入自适应三明治方差估计量——一种修正三明治估计量,可在自适应采样条件下得到一致的方差估计。此外,为证明研究结果,我们开发了适用于非独立同分布自适应采样纵向数据的新颖经验过程理论工具,这些工具可能具有独立研究价值。本研究的驱动力源于我们在设计实验时,既需要在线强化学习算法优化治疗决策,又必须在实验结束后确保统计推断能够有效实施。