The Benjamini-Hochberg (BH) procedure remains widely popular despite having limited theoretical guarantees in the commonly encountered scenario of correlated test statistics. Of particular concern is the possibility that the method could exhibit bursty behavior, meaning that it might typically yield no false discoveries while occasionally yielding both a large number of false discoveries and a false discovery proportion (FDP) that far exceeds its own well controlled mean. In this paper, we investigate which test statistic correlation structures lead to bursty behavior and which ones lead to well controlled FDPs. To this end, we develop a central limit theorem for the FDP in a multiple testing setup where the test statistic correlations can be either short-range or long-range as well as either weak or strong. The theorem and our simulations from a data-driven factor model suggest that the BH procedure exhibits severe burstiness when the test statistics have many strong, long-range correlations, but does not otherwise.
翻译:Benjamini-Hochberg(BH)方法在处理检验统计量相关这一常见场景时,由于理论保证有限,至今仍被广泛使用。特别令人担忧的是该方法可能表现出突发性行为,即通常无错误发现,但偶尔会出现大量错误发现,且错误发现比例(FDP)远超其良好控制的均值。本文旨在探究何种检验统计量相关结构会导致突发性行为,何种结构能使FDP得到良好控制。为此,我们在多重检验框架下建立了FDP的中心极限定理,其中检验统计量的相关性可具有短程或长程特征,以及弱或强相关关系。该定理及基于数据驱动因子模型的模拟结果表明,当检验统计量存在大量强长程相关时,BH方法会表现出严重突发性,但在其他情况下则不会。