Polygenic risk scores (PRS) developed from genome-wide association studies (GWAS) can be used for risk stratification by quantifying the genetic contribution to disease, and many clinical applications have been proposed. Bayesian methods are popular for building PRS because of their natural ability to regularize models and incorporate external information. In this article, we present new theoretical results, methods, and extensive numerical studies to advance Bayesian methods for PRS applications. We identify a potential risk, under a common Bayesian PRS framework, of posterior impropriety when integrating the required GWAS summary statistics and linkage disequilibrium (LD) data from distinct sources. As a principled remedy, we propose a projection of the summary statistics that ensures compatibility between the two sources and in turn a proper behavior of the posterior. We further introduce a new PRS method, with accompanying software, under the less-explored Bayesian bridge prior to more flexibly model varying sparsity levels in effect-size distributions. We extensively benchmark it against alternative Bayesian methods using synthetic and real datasets, quantifying the impact of prior specification and LD estimation strategy. Our proposed PRS-Bridge, equipped with the projection technique and flexible prior, demonstrates the most consistent and generally superior performance across a variety of scenarios.
翻译:基于全基因组关联研究(GWAS)构建的多基因风险评分(PRS)可通过量化遗传对疾病的贡献度用于风险分层,目前已提出诸多临床应用。贝叶斯方法因其天然具备模型正则化与整合外部信息的能力,在构建PRS中广受青睐。本文通过提出新的理论结果、方法及大量数值研究,以推进贝叶斯方法在PRS中的应用。我们发现在常见的贝叶斯PRS框架下,当整合来自不同来源的GWAS汇总统计量与连锁不平衡(LD)数据时,可能存在后验分布不正则的风险。作为原则性解决方案,我们提出对汇总统计量进行投影处理,以确保两个数据源的兼容性,进而保证后验分布的正则性。此外,我们在尚未充分探索的贝叶斯桥先验下提出一种新的PRS方法(附配套软件),以更灵活地建模效应量分布中变化的多水平稀疏性。通过合成与真实数据集,我们将其与多种贝叶斯方法进行广泛基准测试,量化了先验设定与LD估计策略的影响。我们提出的PRS-Bridge方法结合投影技术与灵活先验,在多种场景中均展现出最稳定且总体最优的性能表现。