Assumption-lean weak limits and tests for two-stage adaptive experiments

Adaptive experiments are becoming increasingly popular in real-world applications for effectively maximizing in-sample welfare and efficiency by data-driven sampling. Despite their growing prevalence, however, the statistical foundations for valid inference in such settings remain underdeveloped. Focusing on two-stage adaptive experimental designs, we address this gap by deriving new weak convergence results for mean outcomes and their differences. In particular, our results apply to a broad class of estimators, the weighted inverse probability weighted (WIPW) estimators. In contrast to prior works, our results require significantly weaker assumptions and sharply characterize phase transitions in limiting behavior across different signal regimes. Through this common lens, our general results unify previously fragmented results under the two-stage setup. We further establish quantitative convergence rates in bounded-Lipschitz distance that reveal the fundamental trade-off between exploitation and inferential stability. To address the challenge of potential non-normal limits in conducting inference, we propose a computationally efficient and provably valid simulation-based method for obtaining critical values of the non-normal limiting distributions under the null, enabling practical hypothesis testing. Our results and approaches are sufficiently general to accommodate various adaptive experimental designs, including batched bandit and subgroup enrichment experiments. Simulations and semi-synthetic studies demonstrate the practical value of our approach and reveal that neither normality-based nor non-normality-based testing methods uniformly dominate in power; the relative advantage depends on the structure of the outcome distribution.

翻译：自适应实验因能通过数据驱动抽样有效最大化样本内福利与效率，在实际应用中日益普及。然而，尽管其应用范围不断扩大，此类场景下有效推断的统计基础仍欠完善。本文聚焦两阶段自适应实验设计，通过推导均值结果及其差异的新弱收敛定理来填补这一空白。具体而言，我们的结果适用于广义的加权逆概率加权（WIPW）估计量族。与既往研究相比，我们的结果所需假设显著更弱，并清晰刻画了不同信号强度下极限行为的相变特征。基于这一统一视角，我们的通用结论整合了两阶段框架下先前分散的研究成果。进一步地，我们在有界李普希茨距离下建立了量化收敛速率，揭示了"利用"与推断稳定性之间的根本权衡。为应对潜在非正态极限给推断带来的挑战，我们提出了一种计算高效且可验证有效的模拟方法，用于在原假设下获取非正态极限分布的临界值，从而实现实用的假设检验。我们的结果与方法具有足够通用性，可适用于各类自适应实验设计，包括分批式老虎机与亚组富集实验。仿真与半仿真研究验证了本方法的实用价值，并揭示基于正态性检验与非正态性检验的方法在功效上均非绝对占优，其相对优势取决于结果分布的结构特征。