E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules -- naive and calibrated versions of frequentist, Bayesian, and e-value approaches -- in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package `evalinger` and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.

翻译：自适应临床试验依赖于期中分析、灵活终止和数据依赖的设计修改，这些因素使得在重复检查或适应后重用固定时间范围的检验统计量时，统计保证变得复杂。E值和E过程提供了任意时间有效的检验和置信序列，这些方法在可选停止和可选继续下保持有效，无需预先指定监测计划。本文是一份面向实践者的方法指南。我们针对双臂随机对照试验开发了E过程的投注鞅构造，展示了E值如何自然地处理复合原假设并支持无效性监测，并提供了关于何时E值是合适的、何时既有的替代方法更可取、以及如何将E值监测与成组序贯和贝叶斯自适应工作流相结合的指导。一项数值研究在双臂二分类终点试验中比较了五种监测规则——频率主义、贝叶斯和E值方法的朴素版本与校准版本。在频繁期中观察下，朴素的重复检验和朴素的后验阈值会显著膨胀I类错误。在有效方法中，校准的成组序贯规则达到最高的检验功效，E值规则提供稳健的任意时间有效控制且具有中等功效，而校准的贝叶斯规则最为保守。扩展模拟表明，成组序贯方法与E值方法之间的功效差距取决于监测计划，并在连续监测下发生逆转。该方法学——包括无效性监测、平台试验多重性控制以及将E值与既有方法结合的混合策略——已在开源R包`evalinger`中实现，并置于2026年1月FDA贝叶斯方法学草案指南的监管框架内。