Obesity is widely recognized as a serious and pervasive health concern. We study obesity through body mass index (BMI), which is known to be highly heritable, and identify important genetic risk factors for BMI from hundreds of thousands of single nucleotide polymorphisms (SNPs) in the Framingham Study data. Several challenges arise when using traditional genome-wide association studies (GWAS): (1) They suffer from a low power due to a combination of a limited number of participants and the stringent genome-wide significance threshold; (2) existing prior knowledge from large meta-analyses may provide valuable guidance but is often underutilized; (3) the one-at-a-time univariate marginal regression framework ignores the joint and conditional nature of genetic effects; (4) GWAS focus solely on mean outcomes, whereas obesity inherently concerns abnormally high BMI levels. To address these challenges, we conduct the analysis by proposing and applying a novel Knowledge Integration Quantile Regression (KIQR) approach via simultaneous variable selection and estimation, focusing on the conditional high quantiles of BMI, which are most relevant to obesity risk, while integrating prior information from large-scale studies such as the GIANT consortium and UK Biobank. Notably, we identified promising novel associations: rs3798696 in \textit{TFAP2A}, rs7070523 in \textit{ITIH5}, and rs178260 in \textit{AIFM3}, which have not previously been reported in the GWAS literature. These findings provide new insights into the genetic architecture of obesity and demonstrate that quantile-based modeling with integrated prior knowledge can potentially uncover novel genes missed by traditional GWAS approaches. An R implementation and simulation scripts are available at: https://github.com/KIQR-submission/KIQR
翻译:摘要:肥胖被广泛认为是一种严重且普遍的健康问题。我们通过身体质量指数(BMI)研究肥胖——该指标已知具有高度遗传性——并在弗拉明汉研究数据中从数十万个单核苷酸多态性(SNP)中识别影响BMI的重要遗传风险因素。传统全基因组关联研究(GWAS)面临多项挑战:(1)受限于参与者数量有限及严格的基因组显著性阈值,统计功效较低;(2)大规模荟萃分析中已有的先验知识常被忽略,而此类知识可提供宝贵指导;(3)“单变量边际回归”框架逐一分析变量,忽视了遗传效应的联合性与条件性;(4)GWAS仅关注均值结果,而肥胖本质上涉及异常高BMI水平。为应对这些挑战,我们提出并应用一种新型知识整合分位数回归(KIQR)方法进行分析,该方法通过同时进行变量选择与估计,聚焦于与肥胖风险最相关的BMI条件高分位数,同时整合来自GIANT联盟与英国生物银行等大规模研究的先验信息。值得注意的是,我们识别出若干具有前景的新关联:位于\textit{TFAP2A}的rs3798696、\textit{ITIH5}的rs7070523以及\textit{AIFM3}的rs178260——这些位点此前在GWAS文献中未见报道。这些发现为肥胖的遗传结构提供了新见解,并表明结合先验知识的分位数建模能揭示传统GWAS方法遗漏的新基因。R语言实现脚本及模拟代码见:https://github.com/KIQR-submission/KIQR