Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.
翻译:现代数据分析经常涉及大规模假设检验,自然引出了如何恰当控制第一类错误率(如错误发现率,FDR)的问题。在众多生物医学和技术应用中,另一个复杂性在于假设是以在线方式随时间逐一检验的。然而,传统控制FDR的流程(如Benjamini-Hochberg程序)假设所有p值可在同一时间点获取。为应对这些挑战,过去十五年间发展出一套新方法论体系,展示了如何对在线多重假设检验中的错误率进行控制。在该框架下,假设以流形式到达,分析师在每个时间点基于当前假设的反对证据及先前的拒绝决策,决定是否拒绝该假设。本文系统阐述了在线错误率控制领域的文献,既涵盖关键理论综述,也聚焦应用实例分析。我们同时提供了不同在线检验算法的模拟结果比较,并对已提出的多种方法论扩展进行了最新综述。