Polygenic Risk Scores (PRS) developed from genome-wide association studies (GWAS) are of increasing interest for various clinical and research applications. Bayesian methods have been particularly popular for building PRS in genome-wide scale because of their natural ability to regularize model and borrow information in high-dimension. In this article, we present new theoretical results, methods, and extensive numerical studies to advance Bayesian methods for PRS applications. We conduct theoretical studies to identify causes of convergence issues of some Bayesian methods when required input GWAS summary-statistics and linkage disequilibrium (LD) (genetic correlation) data are derived from distinct samples. We propose a remedy to the problem by the projection of the summary-statistics data into the column space of the genetic correlation matrix. We further implement a PRS development algorithm under the Bayesian Bridge prior which can allow more flexible specification of effect-size distribution than those allowed under popular alternative methods. Finally, we conduct careful benchmarking studies of alternative Bayesian methods using both simulation studies and real datasets, where we carefully investigate both the effect of prior specification and estimation strategies for LD parameters. These studies show that the proposed algorithm, equipped with the projection approach, the flexible prior specification, and an efficient numerical algorithm leads to the development of the most robust PRS across a wide variety of scenarios.
翻译:多基因风险评分(PRS)基于全基因组关联研究(GWAS)发展而来,在临床及研究应用中日益受到关注。贝叶斯方法因其天然具备模型正则化能力且能在高维空间中有效借用信息,在构建全基因组尺度PRS方面尤为流行。本文提出新的理论结果、方法及大量数值研究,以推动贝叶斯方法在PRS应用中的发展。我们通过理论研究揭示了某些贝叶斯方法在所需GWAS汇总统计量与连锁不平衡(LD,遗传相关性)数据来自不同样本时出现收敛问题的成因,并提出通过将汇总统计量数据投影至遗传相关性矩阵列空间中的解决方案。进一步地,我们基于贝叶斯桥先验实现了PRS开发算法,该算法能比主流替代方法更灵活地设定效应量分布。最后,通过模拟研究与真实数据集,我们对替代性贝叶斯方法进行了系统基准测试,重点考察了先验设定与LD参数估计策略的影响。研究结果表明,本文算法结合投影方法、灵活的先验设定及高效数值算法,可在多种场景下构建出最稳健的PRS。