Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables

from arxiv, Revised version, 30 pages, 5 figures. "TeX Source" contains file SI.pdf with Supplementary Information (18 pages, 7 figures). Code available at https://github.com/mariyam-khan/Causal_genes_GWAS_loci_CAD . Supporting data available at https://dataverse.no/dataset.xhtml?persistentId=doi:10.18710/VM0WKQ

Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent, which is usually not possible when considering a group of candidate genes from the same locus. We used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results even at modest sample sizes. Importantly, the causal effect estimates remain unbiased and their variance small when instruments are highly correlated. We applied MVMR with correlated instrumental variable sets at risk loci from genome-wide association studies (GWAS) for coronary artery disease using eQTL data from the STARNET study. Our method predicts causal genes at twelve loci, each associated with multiple colocated genes in multiple tissues. However, the extensive degree of regulatory pleiotropy across tissues and the limited number of causal variants in each locus still require that MVMR is run on a tissue-by-tissue basis, and testing all gene-tissue pairs at a given locus in a single model to predict causal gene-tissue combinations remains infeasible.

翻译：多元孟德尔随机化（MVMR）是一种统计技术，它利用一组遗传工具变量来估计多个暴露因素对目标结局的直接因果效应。在具有多效性基因调控效应的基因组位点（即同一遗传变异与多个邻近基因相关联的位点）中，MVMR有潜力用于预测候选因果基因。然而，该领域的共识要求MVMR中的遗传工具变量必须相互独立，而当考虑来自同一基因座的候选基因群时，这通常无法实现。我们运用因果推断理论证明，使用相关工具变量的MVMR满足工具变量集条件。这是Brito和Pearl（2002）针对结构方程模型提出的经典结论，它保证了在多个暴露因素共同（而非单独）将一组工具变量与结局变量分离的情况下，因果效应的可识别性。大量模拟实验证实，即使在样本量有限的情况下，这些理论结果仍然有效且实用。重要的是，当工具变量高度相关时，因果效应估计值仍保持无偏性且方差较小。我们利用STARNET研究中的eQTL数据，在冠状动脉疾病的全基因组关联研究（GWAS）风险位点上应用了具有相关工具变量集的MVMR方法。我们的方法预测了十二个基因座中的因果基因，每个基因座均与多个组织中的多个共定位基因相关联。然而，由于跨组织调控多效性的广泛程度以及每个基因座中因果变异数量的有限性，仍需在逐组织的基础上运行MVMR分析，而在单一模型中测试给定基因座的所有基因-组织对以预测因果性基因-组织组合目前仍不可行。