Ancestry-specific proteome-wide association studies (PWAS) based on genetically predicted protein expression can reveal complex disease etiology specific to certain ancestral groups. These studies require ancestry-specific models for protein expression as a function of SNP genotypes. In order to improve protein expression prediction in ancestral populations historically underrepresented in genomic studies, we propose a new penalized maximum likelihood estimator for fitting ancestry-specific joint protein quantitative trait loci models. Our estimator borrows information across ancestral groups, while simultaneously allowing for heterogeneous error variances and regression coefficients. We propose an alternative parameterization of our model which makes the objective function convex and the penalty scale invariant. To improve computational efficiency, we propose an approximate version of our method and study its theoretical properties. Our method provides a substantial improvement in protein expression prediction accuracy in individuals of African ancestry, and in a downstream PWAS analysis, leads to the discovery of multiple associations between protein expression and blood lipid traits in the African ancestry population.
翻译:基于遗传预测的蛋白质表达的祖先特异性蛋白质组广泛关联研究(PWAS)可揭示特定祖先群体复杂疾病的病因。此类研究需要构建蛋白质表达与SNP基因型之间关系的祖先特异性模型。为改善基因组研究中 historically 代表性不足的祖先人群的蛋白质表达预测能力,我们提出了一种新的惩罚最大似然估计方法,用于拟合祖先特异性联合蛋白质数量性状位点模型。该估计量在允许异质性误差方差和回归系数的同时,跨祖先群体借用信息。我们提出了一种模型替代参数化方法,使目标函数凸化且惩罚尺度不变。为提高计算效率,我们提出了该方法的近似版本并研究其理论性质。我们的方法显著提升了非洲裔个体蛋白质表达预测的准确性,并在下游PWAS分析中发现了非洲裔人群中蛋白质表达与血脂性状之间的多重关联。