We present a scalable framework for computing polygenic risk scores (PRS) in high-dimensional genomic settings using the recently introduced Univariate-Guided Sparse Regression (uniLasso). UniLasso is a two-stage penalized regression procedure that leverages univariate coefficients and magnitudes to stabilize feature selection and enhance interpretability. Building on its theoretical and empirical advantages, we adapt uniLasso for application to the UK Biobank, a population-based repository comprising over one million genetic variants measured on hundreds of thousands of individuals from the United Kingdom. We further extend the framework to incorporate external summary statistics to increase predictive accuracy. Our results demonstrate that uniLasso attains predictive performance comparable to standard Lasso while selecting substantially fewer variants, yielding sparser and more interpretable models. Moreover, it exhibits superior performance in estimating PRS relative to its competitors, such as PRS-CS. Integrating external scores further improves prediction while maintaining sparsity.
翻译:本文提出了一种可扩展的框架,用于在高维基因组学环境中计算多基因风险评分(PRS),该框架基于最近提出的单变量引导稀疏回归(uniLasso)。UniLasso是一种两阶段惩罚回归方法,它利用单变量系数和幅度来稳定特征选择并增强可解释性。基于其理论和实证优势,我们将uniLasso应用于英国生物样本库——一个基于人群的数据库,包含来自英国数十万个体测量的超过一百万个遗传变异。我们进一步扩展了该框架,以纳入外部汇总统计量,从而提高预测准确性。我们的结果表明,uniLasso在达到与标准Lasso相当的预测性能的同时,选择了显著更少的变异,从而产生了更稀疏和更具可解释性的模型。此外,在估计PRS方面,相较于其竞争对手(如PRS-CS),uniLasso表现出更优的性能。整合外部评分在保持稀疏性的同时进一步提高了预测能力。