Many diseases and traits involve a complex interplay between genes and environment, generating significant interest in studying gene-environment interaction through observational data. However, for lifestyle and environmental risk factors, they are often susceptible to unmeasured confounding factors and as a result, may bias the assessment of the joint effect of gene and environment. Recently, Mendelian randomization (MR) has evolved into a versatile method for assessing causal relationships based on observational data to account for unmeasured confounders. This approach utilizes genetic variants as instrumental variables (IVs) and aims to offer a reliable statistical test and estimation of causal effects. MR has gained substantial popularity in recent years largely due to the success of large-scale genome-wide association studies in identifying genetic variants associated with lifestyle and environmental factors. Many methods have been developed for MR; however, little work has been done for evaluating gene-environment interaction. In this paper, we focus on two primary IV approaches: the 2-stage predictor substitution (2SPS) and the 2-stage residual inclusion (2SRI), and extend them to accommodate gene-environment interaction under both the linear and logistic regression models for the continuous and binary outcomes, respectively. Extensive simulation and analytical derivations show that finding solutions in the linear regression model setting is relatively straightforward; however, the logistic regression model is significantly more complex and demands additional effort.
翻译:许多疾病和性状涉及基因与环境之间的复杂相互作用,因此通过观察性数据研究基因-环境交互备受关注。然而,对于生活方式和环境风险因素,它们通常容易受到未测量的混杂因素的影响,从而可能对基因与环境联合效应的评估产生偏倚。近年来,孟德尔随机化(MR)已发展成为一种基于观察性数据评估因果关系的通用方法,以处理未测量的混杂因素。该方法利用遗传变异作为工具变量(IVs),旨在提供可靠的统计检验和因果效应估计。由于大规模全基因组关联研究在识别与生活方式和环境因素相关的遗传变异方面取得显著成功,MR在近年来越来越受欢迎。尽管已开发出众多MR方法,但针对基因-环境交互评估的研究仍然很少。本文聚焦于两种主要的工具变量方法:两阶段预测因子替代法(2SPS)和两阶段残差纳入法(2SRI),并将其扩展至线性和逻辑回归模型框架下,分别处理连续型与二元结局变量的基因-环境交互分析。大量模拟研究和解析推导表明,在线性回归模型设定下寻找解相对简单;然而,逻辑回归模型复杂性显著增加,需要投入更多研究努力。