Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power

Adaptive experiments such as multi-armed bandits offer efficiency gains over traditional randomized experiments but pose two major challenges: invalid inference on the Average Treatment Effect (ATE) due to adaptive sampling and low statistical power for sub-optimal treatments. We address both issues by extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision. Second, we introduce MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference. Both methods retain MAD's core advantage of constructing asymptotic confidence sequences (CSs) that allow researchers to continuously monitor ATE estimates and stop data collection once a desired precision or significance criterion is met. Empirically, we validate both methods using simulations and real-world data. In simulations, MADCovar reduces CS width by up to 60% relative to MAD. In a large-scale political RCT with 32,000 participants, MADCovar achieves similar precision gains. MADMod improves statistical power and inferential precision across all treatment arms, particularly for suboptimal treatments. Simulations show that MADMod sharply reduces Type II error while preserving the efficiency benefits of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments more practical, reliable, and efficient for applied researchers across many domains. Our proposed methods are implemented through an open-source software package.

翻译：多臂老虎机等自适应实验相较于传统随机化实验具有效率优势，但也带来两大挑战：因自适应抽样导致的平均处理效应推断失效，以及对次优处理的统计功效不足。本文通过扩展混合自适应设计框架（arXiv:2311.05794）同时解决这两个问题。首先，我们提出MADCovar——一种协变量调整的ATE估计量，该估计量保持无偏性并维持随时有效推断保证，同时显著提升ATE估计精度。其次，我们引入MADMod方法，通过动态将样本重新分配给功效不足的处理臂，在不牺牲有效推断的前提下实现跨处理组的更均衡统计功效。两种方法均保留了MAD框架的核心优势：能够构建渐近置信序列，使研究者能够持续监测ATE估计值，并在达到预期精度或显著性标准时随时终止数据收集。我们通过仿真实验与真实数据对两种方法进行了实证验证。仿真结果表明，相较于MAD方法，MADCovar能将CS宽度缩减最高达60%。在一项包含32,000名参与者的大规模政治随机对照试验中，MADCovar实现了相近的精度提升。MADMod则全面提升了所有处理臂的统计功效与推断精度，对次优处理的改善尤为显著。仿真实验显示MADMod在保持自适应分配效率优势的同时，能显著降低II类错误。综合而言，MADCovar与MADMod使自适应实验在跨领域应用研究中更具实用性、可靠性与效率性。本文所提方法已通过开源软件包实现。