Response-adaptive randomization (RAR) can increase participant benefit in clinical trials, but also complicates statistical analysis. The burn-in period (a non-adaptive initial stage) is commonly used to mitigate this disadvantage, yet guidance on its optimal duration is scarce. To address this critical gap, this paper introduces an exact evaluation approach to investigate how the burn-in length impacts statistical operating characteristics of two-arm binary Bayesian RAR (BRAR) designs. We show that (1) commonly used calibration and asymptotic tests show substantial type I error rate inflation for BRAR designs without a burn-in period, and increasing the total burn-in length to more than half the trial size reduces but does not fully mitigate type I error rate inflation, necessitating exact tests; (2) exact tests conditioning on total successes show the highest average and minimum power up to large burn-in lengths; (3) the burn-in length substantially influences power and participant benefit, which are often not maximized at the maximum or minimum possible burn-in length; (4) the test statistic influences the type I error rate and power; (5) estimation bias decreases quicker in the burn-in length for larger treatment effects and increases for larger trial sizes under the same burn-in length. Our approach is illustrated by re-designing the ARREST trial.
翻译:响应自适应随机化(RAR)可提升临床试验中参与者的获益,但同时也使统计分析复杂化。烧录期(一个非自适应的初始阶段)常被用于缓解此劣势,然而关于其最优时长的指导却十分匮乏。为填补这一关键空白,本文提出一种精确评估方法,用于研究烧录时长如何影响双臂二元贝叶斯响应自适应随机化(BRAR)设计的统计运行特征。我们证明:(1)对于无烧录期的BRAR设计,常用的校准检验与渐近检验均显示出显著的I类错误率膨胀;将总烧录时长增加至超过试验规模的一半可减少但无法完全消除I类错误率膨胀,因此必须采用精确检验;(2)以总成功数为条件的精确检验在直至较大烧录时长范围内均表现出最高的平均功效与最低功效;(3)烧录时长显著影响功效与参与者获益,且二者通常在最大或最小可能烧录时长处并未达到最优;(4)检验统计量影响I类错误率与功效;(5)估计偏差随烧录时长的增加而更快减小(对于更大的处理效应),且在相同烧录时长下随试验规模的增大而增加。我们通过重新设计ARREST试验对本方法进行了例证。