Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.
翻译:现代数据分析通常涉及大规模假设检验,这自然引发了对合适的第一类错误率(如错误发现率FDR)进行控制的问题。在许多生物医学和技术应用中,一个额外的复杂性在于假设检验以在线方式逐个进行,即随时间逐一展开。然而,传统控制FDR的方法(如Benjamini-Hochberg程序)假设所有p值可在单一时间点同时获取。为应对这些挑战,过去15年间发展出一套新的方法论,展示了如何控制在线多重假设检验的错误率。在此框架下,假设以流式数据形式到达,分析人员在每个时间点需依据当前假设的证据以及先前的拒绝决策,决定是否拒绝当前假设。本文对在线错误率控制的文献进行了全面阐述,不仅综述关键理论,还聚焦于实际应用案例。我们还提供了不同在线检验算法的模拟结果比较,并对已提出的众多方法论扩展进行了最新概述。