Signal region detection is one of the challenging problems in modern statistics and has broad applications especially in genetic studies. We propose a novel approach effectively coupling with high-dimensional test, which is distinct from existing methods based on scan or knockoff statistics. The idea is to conduct binary segmentation with re-search and arrangement based on a sequence of dynamic tests to increase detection accuracy and reduce computation. Theoretical and empirical studies demonstrate that our approach enjoys favorable theoretical guarantees with fewer restrictions and exhibits superior numerical performance with faster computation. Compared to scan-based methods, our procedure is capable of detecting shorter or longer regions with unbalanced signal strengths while allowing for more dependence structures. Relative to the knockoff framework that only controls false discovery rate, our approach attains higher detection accuracy while controlling the family-wise error rate. Utilizing the UK Biobank data to identify the genetic regions related to breast cancer, we confirm previous findings and meanwhile, identify a number of new regions which suggest strong association with risk of breast cancer and deserve further investigation.
翻译:信号区域检测是现代统计学中的挑战性问题之一,在遗传学研究中具有广泛应用。我们提出一种与高维检验有效耦合的新方法,该方法不同于基于扫描或敲除统计量的现有技术。其核心思想是:通过一系列动态检验,结合再搜索与重排机制进行二元分割,以提高检测精度并降低计算成本。理论与实证研究表明,本方法在更少约束条件下具备优越的理论保证,同时展现出更快的计算速度与更优的数值性能。相较于基于扫描的方法,本方法能够检测信号强度不均衡的较短或较长区域,且对依赖结构具有更高容忍度。相对于仅控制错误发现率的敲除框架,本方法在控制族系错误率的同时实现了更高的检测精度。利用英国生物银行数据识别与乳腺癌相关的遗传区域时,我们不仅验证了既往发现,还识别出多个与乳腺癌风险显著关联的新区域,这些区域值得进一步探究。