Mediation analysis aims to identify and estimate the effect of an exposure on an outcome that is mediated through one or more intermediate variables. In the presence of multiple intermediate variables, two pertinent methodological questions arise: estimating mediated effects when mediators are correlated, and performing high-dimensional mediation analysis when the number of mediators exceeds the sample size. This paper presents a two-step procedure for high-dimensional mediation analysis. The first step selects a reduced number of candidate mediators using an ad-hoc lasso penalty. The second step applies a procedure we previously developed to estimate the mediated and direct effects, accounting for the correlation structure among the retained candidate mediators. We compare the performance of the proposed two-step procedure with state-of-the-art methods using simulated data. Additionally, we demonstrate its practical application by estimating the causal role of DNA methylation in the pathway between smoking and rheumatoid arthritis using real data.
翻译:中介分析旨在识别和估计暴露变量通过一个或多个中间变量对结果变量产生的间接效应。当存在多个中间变量时,会引发两个关键的方法学问题:如何在中介变量相关时估计中介效应,以及如何在中介变量数量超过样本量时进行高维中介分析。本文提出了一种高维中介分析的两步法。第一步采用启发式套索惩罚筛选出数量缩减的候选中介变量。第二步应用我们先前开发的方法,在考虑保留的候选中介变量间相关结构的前提下,估计中介效应与直接效应。我们通过模拟数据将所提出的两步法与前沿方法进行性能比较。此外,我们利用真实数据,通过估计DNA甲基化在吸烟与类风湿关节炎之间的因果通路作用,展示了该方法的实际应用价值。