The recent explosion of genetic and high dimensional biobank and 'omic' data has provided researchers with the opportunity to investigate the shared genetic origin (pleiotropy) of hundreds to thousands of related phenotypes. However, existing methods for multi-phenotype genome-wide association studies (GWAS) do not model pleiotropy, are only applicable to a small number of phenotypes, or provide no way to perform inference. To add further complication, raw genetic and phenotype data are rarely observed, meaning analyses must be performed on GWAS summary statistics whose statistical properties in high dimensions are poorly understood. We therefore developed a novel model, theoretical framework, and set of methods to perform Bayesian inference in GWAS of high dimensional phenotypes using summary statistics that explicitly model pleiotropy, beget fast computation, and facilitate the use of biologically informed priors. We demonstrate the utility of our procedure by applying it to metabolite GWAS, where we develop new nonparametric priors for genetic effects on metabolite levels that use known metabolic pathway information and foster interpretable inference at the pathway level.
翻译:近年来基因、高维度生物库及组学数据的爆发式增长,为研究者探索数百至数千种相关表型的共享遗传起源(多效性)提供了机遇。然而,现有的多表型全基因组关联分析(GWAS)方法既未建模多效性,又仅适用于少量表型,或无法进行统计推断。更复杂的是,原始基因型和表型数据难以获取,迫使分析必须基于GWAS汇总统计量进行——而这些统计量在高维空间中的统计特性仍鲜为人知。为此,我们开发了新型模型、理论框架与方法体系,通过显式建模多效性的GWAS汇总统计量,实现高维表型的贝叶斯推断,兼具快速计算与整合生物学先验知识的能力。通过将本方法应用于代谢物GWAS,我们验证其实用价值:针对代谢物水平的遗传效应,开发了利用已知代谢通路信息的新型非参数先验,从而在通路层面实现可解释性推断。