Linkage disequilibrium score regression (LDSC) has emerged as an essential tool for genetic and genomic analyses of complex traits, utilizing high-dimensional data derived from genome-wide association studies (GWAS). LDSC computes the linkage disequilibrium (LD) scores using an external reference panel, and integrates the LD scores with only summary data from the original GWAS. In this paper, we investigate LDSC within a fixed-effect data integration framework, underscoring its ability to merge multi-source GWAS data and reference panels. In particular, we take account of the genome-wide dependence among the high-dimensional GWAS summary statistics, along with the block-diagonal dependence pattern in estimated LD scores. Our analysis uncovers several key factors of both the original GWAS and reference panel datasets that determine the performance of LDSC. We show that it is relatively feasible for LDSC-based estimators to achieve asymptotic normality when applied to genome-wide genetic variants (e.g., in genetic variance and covariance estimation), whereas it becomes considerably challenging when we focus on a much smaller subset of genetic variants (e.g., in partitioned heritability analysis). Moreover, by modeling the disparities in LD patterns across different populations, we unveil that LDSC can be expanded to conduct cross-ancestry analyses using data from distinct global populations (such as European and Asian). We validate our theoretical findings through extensive numerical evaluations using real genetic data from the UK Biobank study.
翻译:连锁不平衡分数回归(LDSC)已成为复杂性状遗传与基因组分析的重要工具,它利用全基因组关联研究(GWAS)产生的高维数据。LDSC使用外部参考面板计算连锁不平衡(LD)分数,并将LD分数与原始GWAS的汇总统计数据进行整合。本文在固定效应数据整合框架下研究LDSC,强调其融合多源GWAS数据与参考面板的能力。具体而言,我们考虑了高维GWAS汇总统计量在全基因组范围内的依赖性,以及估计LD分数中的块对角依赖模式。我们的分析揭示了决定LDSC性能的原始GWAS与参考面板数据集中的若干关键因素。研究表明,当LDSC类估计量应用于全基因组遗传变异(如遗传方差与协方差估计)时,实现渐近正态性相对可行;而当聚焦于更小规模的遗传变异子集(如分区遗传力分析)时,这一过程变得相当困难。此外,通过模拟不同人群间LD模式的差异,我们揭示了LDSC可扩展至利用全球不同人群数据(如欧洲与亚洲人群)进行跨族群分析。我们通过使用英国生物银行研究的真实遗传数据进行广泛数值评估,验证了理论发现。