Genome-wide association studies (GWAS) often find association signals between many genetic variants and traits of interest in a genomic region. Functional annotations of these variants provide valuable prior information that helps prioritize biologically relevant variants and enhances the power to detect causal variants. However, due to substantial correlations among these variants, a critical question is how to rigorously control the false discovery rate while effectively leveraging prior knowledge. We introduce annotation-informed knockoffs (AnnoKn), a knockoff-based method that performs annotation-informed variable selection with strict control of the false discovery rate. AnnoKn integrates the knockoff procedure with adaptive Lasso regression to evaluate the importance of multiple covariates while incorporating functional annotation information within a unified Bayesian framework. To facilitate real-world applications where individual-level data are not accessible, we further extend AnnoKn to operate on summary statistics. Through simulations and real-world applications to GTEx and GWAS datasets, we show that AnnoKn achieves superior power in detecting causal genetic variants compared with existing annotation-informed variable selection methods, while maintaining valid control over false discoveries.
翻译:全基因组关联研究(GWAS)经常在基因组区域内发现大量遗传变异与目标性状之间的关联信号。这些变异的功能注释提供了宝贵的先验信息,有助于优先考虑具有生物学相关性的变异,并增强检测因果变异的能力。然而,由于这些变异之间存在显著的相关性,一个关键问题是如何在有效利用先验知识的同时,严格地控制错误发现率。我们提出了注释信息Knockoffs方法(AnnoKn),这是一种基于Knockoff的方法,能够在严格控制错误发现率的前提下执行基于注释信息的变量选择。AnnoKn将Knockoff程序与自适应Lasso回归相结合,在一个统一的贝叶斯框架内,结合功能注释信息来评估多个协变量的重要性。为了适应无法获取个体水平数据的实际应用场景,我们进一步扩展了AnnoKn,使其能够基于汇总统计量运行。通过对GTEx和GWAS数据集的模拟和实际应用,我们证明,与现有的基于注释信息的变量选择方法相比,AnnoKn在检测因果遗传变异方面具有更优的检验效能,同时保持了对错误发现的可靠控制。