Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables

from arxiv, 26 pages, 5 figures, 3 supplementary figures. Code available at https://github.com/mariyam-khan/Causal_genes_GWAS_loci_CAD . Supporting data available at https://dataverse.no/dataset.xhtml?persistentId=doi:10.18710/VM0WKQ

Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent, which is usually not possible when considering a group of candidate genes from the same locus. We used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results even at modest sample sizes. Importantly, the causal effect estimates remain unbiased and their variance small when instruments are highly correlated. We applied MVMR with correlated instrumental variable sets at risk loci from genome-wide association studies (GWAS) for coronary artery disease using eQTL data from the STARNET study. Our method predicts causal genes at twelve loci, each associated with multiple colocated genes in multiple tissues. However, the extensive degree of regulatory pleiotropy across tissues and the limited number of causal variants in each locus still require that MVMR is run on a tissue-by-tissue basis, and testing all gene-tissue pairs at a given locus in a single model to predict causal gene-tissue combinations remains infeasible.

翻译：多元孟德尔随机化（MVMR）是一种利用多组遗传工具变量估计多个暴露因素对结局变量直接因果效应的统计方法。在具有多效基因调控效应的基因组位点（即同一遗传变异与多个邻近基因关联的位点）上，MVMR可用于预测候选致病基因。然而，当前学界共识要求MVMR中的遗传工具变量必须相互独立，这在针对同一基因位点的一组候选基因时通常难以实现。我们利用因果推断理论证明，采用相关工具变量的MVMR满足工具变量集条件。这是Brito和Pearl（2002）针对结构方程模型提出的经典结论，保证了当多个暴露变量共同（而非单独）将一组工具变量与结局变量分离时，因果效应的可识别性。广泛模拟验证了这些理论结果在中等样本量下的有效性与实用性。重要的是，当工具变量高度相关时，因果效应估计仍保持无偏性且方差较小。我们利用STARNET研究的eQTL数据，将相关工具变量集的MVMR应用于冠心病全基因组关联研究（GWAS）风险位点。该方法成功预测了12个基因位点的致病基因，每个位点均与多个组织中的多个共定位基因相关。然而，跨组织广泛存在的调控多效性以及每个位点中有限数量的因果变异，仍要求MVMR按组织逐一分析；在单个模型中检验给定基因位点所有基因-组织对以预测致病因-组织组合仍不可行。