Genome-wide association studies (GWAS) have led to the discovery of numerous single nucleotide polymorphisms (SNPs) associated with various phenotypes and complex diseases. However, the identified genetic variants do not fully explain the heritability of complex traits, known as the missing heritability problem. To address this challenge and accurately control false positives while maximizing true associations, we propose two approaches involving linkage disequilibrium (LD) scores as covariates. We apply principal component analysis (PCA), one of the dimensionality reduction techniques, to control the False Discovery Rate (FDR) in the presence of high-dimensional covariates. This method not only provides a convenient interpretation of how multiple covariates in high dimensions affect the control of FDR but also offers higher statistical power compared to cases where covariates are not used. Furthermore, we aim to investigate how covariates contribute to increasing the statistical power through various simulation experiments, comparing the results with real data examples to derive better interpretations. Using real-world datasets, including GWAS with Body Mass Index (BMI) as the phenotype, we evaluate the performance of our proposed approaches. By incorporating LD scores as covariates in FDR-controlled GWAS analyzes, we demonstrate their effectiveness in selecting informative LD scores and improving the identification of significant SNPs. Our methods alleviate computational burden and enhance interpretability while retaining essential information from LD scores. In general, our study contributes to the advancement of statistical methods in GWAS and provides practical guidance for researchers looking to improve the precision of genetic association analyses.
翻译:全基因组关联研究(GWAS)已发现大量与各类表型及复杂疾病相关的单核苷酸多态性(SNP)。然而,已识别的遗传变异并不能完全解释复杂性状的遗传力,即所谓的“遗传力缺失”问题。为应对这一挑战,在最大化真实关联的同时精确控制假阳性,我们提出了两种以连锁不平衡(LD)评分作为协变量的方法。我们采用主成分分析(PCA)这一降维技术,在高维协变量存在的情况下控制错误发现率(FDR)。该方法不仅便于解释高维多重协变量如何影响FDR控制,且与未使用协变量的情况相比,具有更高的统计功效。此外,我们旨在通过多种模拟实验探究协变量如何提升统计功效,并将结果与真实数据案例进行比较以获得更优解释。利用包含以体重指数(BMI)作为表型的GWAS在内的真实数据集,我们评估了所提方法的性能。通过在FDR控制的GWAS分析中引入LD评分作为协变量,我们证明了其在筛选信息性LD评分和提升显著SNP识别能力方面的有效性。我们的方法在保留LD评分核心信息的同时,减轻了计算负担并增强了可解释性。总体而言,本研究推动了GWAS统计方法的发展,并为寻求提升遗传关联分析精确度的研究者提供了实用指导。