Obesity is widely recognized as a critical and pervasive health concern. We strive to identify important genetic risk factors from hundreds of thousands of single nucleotide polymorphisms (SNPs) for obesity. We propose and apply a novel Quantile Regression with Insight Fusion (QRIF) approach that can integrate insights from established studies or domain knowledge to simultaneously select variables and modeling for ultra-high dimensional genetic data, focusing on high conditional quantiles of body mass index (BMI) that are of most interest. We discover interesting new SNPs and shed new light on a comprehensive view of the underlying genetic risk factors for different levels of BMI. This may potentially pave the way for more precise and targeted treatment strategies. The QRIF approach intends to balance the trade-off between the prior insights and the observed data while being robust to potential false information. We further establish the desirable asymptotic properties under the challenging non-differentiable check loss functions via Huber loss approximation and nonconvex SCAD penalty via local linear approximation. Finally, we develop an efficient algorithm for the QRIF approach. Our simulation studies further demonstrate its effectiveness.
翻译:肥胖被广泛认为是一个关键且普遍的健康问题。本研究致力于从数十万个单核苷酸多态性(SNPs)中识别重要的肥胖遗传风险因素。我们提出并应用了一种新颖的“分位数回归与知识融合”(QRIF)方法,该方法能够整合已有研究或领域知识中的先验信息,在超高维遗传数据中同时进行变量选择与建模,重点关注最受关注的身体质量指数(BMI)高条件分位数。我们发现了有趣的新SNPs,并对不同BMI水平下潜在的遗传风险因素提供了更全面的新见解。这可能为更精准、更具针对性的治疗策略铺平道路。QRIF方法旨在平衡先验知识与观测数据之间的权衡,同时对潜在的虚假信息具有鲁棒性。我们进一步通过Huber损失逼近不可微的检查损失函数,并通过局部线性逼近非凸SCAD惩罚,建立了具有挑战性条件下的理想渐近性质。最后,我们为QRIF方法开发了一种高效算法。模拟研究进一步验证了其有效性。