Adaptive treatment assignment algorithms, such as bandit algorithms, are increasingly used in digital health intervention clinical trials. Frequently, the data collected from these trials is used to conduct causal inference and related data analyses to decide how to refine the intervention, and whether to roll-out the intervention more broadly. This work studies inference for estimands that depend on the adaptive algorithm itself; a simple example is the mean reward under the adaptive algorithm. Specifically, we investigate the replicability of statistical analyses concerning such estimands when using data from trials deploying adaptive treatment assignment algorithms. We demonstrate that many standard statistical estimators can be inconsistent and fail to be replicable across repetitions of the clinical trial, even as the sample size grows large. We show that this non-replicability is intimately related to properties of the adaptive algorithm itself. We introduce a formal definition of a "replicable bandit algorithm" and prove that under such algorithms, a wide variety of common statistical estimators are guaranteed to be consistent and asymptotically normal. We present both theoretical results and simulation studies based on a mobile health oral health self-care intervention. Our findings underscore the importance of designing adaptive algorithms with replicability in mind, especially for settings like digital health, where deployment decisions rely heavily on replicated evidence. We conclude by discussing open questions on the connections between algorithm design, statistical inference, and experimental replicability.
翻译:自适应治疗分配算法(如赌博机算法)在数字健康干预临床试验中的应用日益广泛。这些试验收集的数据通常用于因果推断及相关数据分析,以决定如何优化干预措施以及是否更广泛地推广干预方案。本研究针对依赖于自适应算法本身的估计量进行推断分析——一个简单的例子是自适应算法下的平均奖励。具体而言,我们探讨了在使用部署自适应治疗分配算法的试验数据时,关于此类估计量的统计分析的可复现性问题。我们证明,即使样本量不断增大,许多标准统计估计量仍可能不一致,且无法在临床试验的重复实施中保持可复现性。这种不可复现性与自适应算法本身的性质密切相关。我们提出了“可复现赌博机算法”的形式化定义,并证明在此类算法下,多种常见统计估计量能够保证一致性和渐近正态性。我们通过理论结果和基于移动健康口腔自我护理干预的模拟研究进行验证。研究结果强调了在设计自适应算法时考虑可复现性的重要性,特别是在数字健康等高度依赖可复现证据进行部署决策的领域。最后,我们讨论了算法设计、统计推断与实验可复现性之间关联的开放性问题。