Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "pooling" data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the \textit{adaptive} sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude.
翻译:在线强化学习及其他自适应采样算法在数字干预实验中广泛应用,以随时间推移优化对用户的治疗方案。本研究聚焦于一大类自适应采样算法所收集的纵向用户数据——这些算法旨在通过整合多用户的累积数据,在线优化治疗决策。跨用户数据合并(或"池化")有助于算法更快学习。然而,合并数据会导致算法诱使用户数据轨迹间产生依赖性;我们证明,这种依赖性可能使适用于独立同分布数据的标准方差估计器低估此类数据中常见估计量的真实方差。我们提出新颖方法,通过Z-估计对这类自适应采样数据进行多种统计分析。具体而言,我们引入*自适应*夹层方差估计器——一种修正的夹层估计器,可在自适应采样条件下得到一致方差估计。此外,为证明我们的结果,我们开发了针对非独立同分布、自适应采样纵向数据的经验过程新理论工具,这些工具本身可能具有独立研究价值。本工作的实际动机源于我们设计实验的需求——实验中在线强化学习算法优化治疗决策,而统计推断对于实验结束后的数据分析至关重要。