Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores. These findings suggest that advanced statistical methods that borrow information across multiple ancestries may improve disease risk prediction, but with limited benefit.
翻译:近期全基因组关联研究(GWAS)揭示了复杂性状的遗传基础,但非欧洲血统个体的代表性不足,凸显了遗传研究中的关键空白。本文评估了利用多组学数据改善不同祖先群体疾病预测的可能性。我们以英国生物银行中不同祖先群体为重点,评估了分组LASSO交互网络(glinternet)和预训练套索在疾病预测中的性能。模型基于英国白人及其他祖先群体的数据训练,并在超过96,000名个体的队列中就8种疾病进行验证。在训练的96个模型中,我们报告了16个在ROC-AUC评分方面具有统计学显著增量预测性能的模型。这些发现表明,跨多个祖先群体借鉴信息的先进统计方法可能改善疾病风险预测,但收益有限。