Sequential hypothesis tests are widely adopted as a principled way to perform multiple tests on data that arrives over time. In particular, researchers frequently utilize group sequential hypothesis tests (GST) to test the same hypotheses at K times or "groups" while data arrives sequentially. In this setting, many methods have been proposed to allow researchers to uniformly control type-1 error across K checks (often known as various alpha-spending budgets). Although these methods are all successfully valid in controlling uniform type-1 error, it is not clear which of these methods are optimal when trying to reject the null as soon as possible. In this paper, we directly optimize the rejection criterion in the GST setting under the same constraints of controlling type-1 and type-2 errors. We use a sample average approximation combined with mixed integer linear programming (S-MILP) approach for this problem and show how our S-MILP approach dominates classical GST procedures such as Lan-DeMets, Pocock, and O'Brien-Fleming methods. We also find that the optimal solution typically aggressively spends the alpha-budget early, shedding insight to the long-standing debate of which alpha-spending budgets are more efficient. We finally apply our optimal S-MILP approach to a recent study on acute kidney injury interventions and find our optimal S-MILP approach can reach the same statistically significant conclusion faster than the original study and other GST methods.
翻译:序贯假设检验被广泛用作对随时间到达的数据进行多重检验的原则性方法。研究者常采用分组序贯假设检验(GST)在数据序贯到达期间对同一假设进行K次或"分组"检验。在此框架下,许多方法被提出以使研究者能在K次检验中统一控制第一类错误(通常称为各类α支出预算)。尽管这些方法在控制统一第一类错误方面均有效,但尚不明确其中哪些方法能在尽可能早地拒绝原假设时达到最优。本文在控制第一类与第二类错误的相同约束条件下,直接优化GST框架中的拒绝准则。我们采用样本平均近似与混合整数线性规划(S-MILP)相结合的方法解决该问题,并展示S-MILP方法如何优于Lan-DeMets、Pocock及O'Brien-Fleming等经典GST程序。研究同时发现,最优解通常会在早期激进地消耗α预算,这为长期存在的"何种α支出预算更有效"的争论提供了新见解。最终我们将所提出的最优S-MILP方法应用于一项急性肾损伤干预措施的最新研究,结果显示该方法能比原始研究及其他GST方法更快得出相同统计学显著结论。