Research on the localization of the genetic basis associated with diseases or traits has been widely conducted in the last a few decades. Scan methods have been developed for region-based analysis in whole-genome association studies, helping us better understand how genetics influences human diseases or traits, especially when the aggregated effects of multiple causal variants are present. In this paper, we propose a fast and effective algorithm coupling with high-dimensional test for simultaneously detecting multiple signal regions, which is distinct from existing methods using scan or knockoff statistics. The idea is to conduct binary splitting with re-search and arrangement based on a sequence of dynamic critical values to increase detection accuracy and reduce computation. Theoretical and empirical studies demonstrate that our approach enjoys favorable theoretical guarantees with fewer restrictions and exhibits superior numerical performance with faster computation. Utilizing the UK Biobank data to identify the genetic regions related to breast cancer, we confirm previous findings and meanwhile, identify a number of new regions which suggest strong association with risk of breast cancer and deserve further investigation.
翻译:在过去几十年中,与疾病或性状相关的遗传基础定位研究得到了广泛开展。针对全基因组关联研究中的区域分析,已开发出扫描法等方法,这些方法有助于我们更好地理解遗传学如何影响人类疾病或性状,特别是在存在多个因果变异累积效应的情况下。本文提出了一种结合高维检验的快速高效算法,用于同时检测多个信号区域,这有别于现有使用扫描法或敲除统计量的方法。其核心思想是基于一系列动态临界值进行带有重搜索和重排的二元分割,以提高检测精度并减少计算量。理论与实证研究表明,我们的方法在限制条件更少的情况下具有优越的理论保证,同时展现出更快的计算速度和更优异的数值表现。通过利用英国生物银行数据识别与乳腺癌相关的遗传区域,我们不仅验证了先前的发现,还识别出多个与乳腺癌风险呈强烈关联的新区域,这些区域值得进一步探究。