Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models (GLMs) have been proposed for hierarchical selection of gene by environment interaction (GEI) effects, where a GEI effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these methods allow to include random effects to account for population structure, subject relatedness and shared environmental exposure. In this paper, we develop a unified approach based on regularized penalized quasi-likelihood (PQL) estimation to perform hierarchical selection of GEI effects in sparse regularized mixed models. We compare the selection and prediction accuracy of our proposed model with existing methods through simulations under the presence of population structure and shared environmental exposure. We show that for all simulation scenarios, compared to other penalized methods, our proposed method enforced sparsity by controlling the number of false positives in the model while having the best predictive performance. Finally, we apply our method to a real data application using the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, and found that our method retrieves previously reported significant loci.
翻译:基因与环境因素的交互作用可能在许多常见疾病的病因学中发挥关键作用。已有研究提出了几种正则化广义线性模型用于基因-环境交互效应的分层选择,此类方法要求在模型中仅当相应的遗传主效应被选中时,其基因-环境交互效应才被选择。然而,这些方法均未考虑纳入随机效应以解释群体结构、个体亲缘关系及共享环境暴露。本文基于正则化惩罚拟似然估计,提出了一种统一方法,用于稀疏正则化混合模型中的基因-环境交互效应分层选择。通过模拟包含群体结构和共享环境暴露的场景,我们比较了所提模型与现有方法的选择准确性和预测精度。结果表明,在所有模拟场景下,与其他惩罚方法相比,本文方法通过控制模型中的假阳性数量来强制实现稀疏性,同时具备最优的预测性能。最后,我们将该方法应用于口面部疼痛:前瞻性评估与风险评估(OPPERA)研究的真实数据,发现该方法能够识别出先前报道的显著位点。