Causal mediation analysis aims to investigate how an intermediary factor, called a mediator, regulates the causal effect of a treatment on an outcome. With the increasing availability of measurements on a large number of potential mediators, methods for selecting important mediators have been proposed. However, these methods often assume the absence of unmeasured mediator-outcome confounding. We allow for such confounding in a linear structural equation model for the outcome and further propose an approach to tackle the mediator selection issue. To achieve this, we firstly identify causal parameters by constructing a pseudo proxy variable for unmeasured confounding. Leveraging this proxy variable, we propose a partially penalized method to identify mediators affecting the outcome. The resultant estimates are consistent, and the estimates of nonzero parameters are asymptotically normal. Motivated by these results, we introduce a two-step procedure to consistently select active mediation pathways, eliminating the need to test composite null hypotheses for each mediator that are commonly required by traditional methods. Simulation studies demonstrate the superior performance of our approach compared to existing methods. Finally, we apply our approach to genomic data, identifying gene expressions that potentially mediate the impact of a genetic variant on mouse obesity.
翻译:因果中介分析旨在探究中介因素(称为中介变量)如何调控处理对结局的因果效应。随着大量潜在中介变量测量数据的日益可得,筛选重要中介变量的方法已被提出。然而,这些方法通常假设不存在未测量的中介-结局混杂。我们在结局的线性结构方程模型中允许此类混杂的存在,并进一步提出一种解决中介选择问题的方法。为此,我们首先通过构建未测量混杂的伪代理变量来识别因果参数。利用该代理变量,我们提出一种部分惩罚化方法来识别影响结局的中介变量。所得到的估计量具有一致性,且非零参数的估计量具有渐近正态性。受这些结果的启发,我们引入一个两步程序来一致地选择活跃中介通路,从而无需对每个中介变量进行传统方法通常所需的复合零假设检验。模拟研究表明,与现有方法相比,我们的方法具有更优的性能。最后,我们将该方法应用于基因组数据,识别出可能介导遗传变异对小鼠肥胖影响的基因表达。