The detection of weak and rare effects in large amounts of data arises in a number of modern data analysis problems. Known results show that in this situation the potential of statistical inference is severely limited by the large-scale multiple testing that is inherent in these problems. Here we show that fundamentally more powerful statistical inference is possible when there is some structure in the signal that can be exploited, e.g. if the signal is clustered in many small blocks, as is the case in some relevant applications. We derive the detection boundary in such a situation where we allow both the number of blocks and the block length to grow polynomially with sample size. We derive these results both for the univariate and the multivariate settings as well as for the problem of detecting clusters in a network. These results recover as special cases the sparse mixture detection problem (Donoho and Jin, 2004) where there is no structure in the signal, as well as the scan problem (Chan and Walther, 2013) where the signal comprises a single interval. We develop methodology that allows optimal adaptive detection in the general setting, thus exploiting the structure if it is present without incurring a relevant penalty in the case where there is no structure. The advantage of this methodology can be considerable, as in the case of no structure the means need to increase at the rate $\sqrt{\log n}$ to ensure detection, while the presence of structure allows detection even if the means $decrease$ at a polynomial rate.
翻译:已知结果显示,在这种情况下,统计推断的潜力由于这些问题所固有的大规模多重测试而受到严重限制。 我们在这里表明,当信号中存在可以利用的某种结构时,统计推断就有可能从根本上更加有力,例如,如果信号集中在许多小块中,就像某些相关应用中的情况那样,信号被集中在许多小块中,以及扫描问题(Chan和Walther,2013年)中,信号包含一个单一的间隔期。我们制定方法,以便能够在总体环境中进行最佳的适应性检测,因此,如果在不出现美元比例的情况下利用结构,则在网络中发现集群的问题。这些结果作为特殊案例恢复了稀少混合检测问题(Donoho和Jin,2004年),因为信号中没有结构,而且扫描问题(Chan和Walther,2013年)也存在一个单一的间隔期。我们开发了方法,允许在总体环境中进行最佳的适应性检测,因此,在不出现美元比率的情况下利用这一结构。在不产生相当高的检测率的情况下,在调查中,这种结构不会出现相当的优势,因此,在调查中可以采用相当高的汇率结构。