We study high-dimensional mediation analysis in which exposures, mediators, and outcomes are all multivariate, and both exposures and mediators may be high-dimensional. We formalize this as a many (exposures)-to-many (mediators)-to-many (outcomes) (MMM) mediation analysis problem. Methodologically, MMM mediation analysis simultaneously performs variable selection for high-dimensional exposures and mediators, estimates the indirect effect matrix (i.e., the coefficient matrices linking exposure-to-mediator and mediator-to-outcome pathways), and enables prediction of multivariate outcomes. Theoretically, we show that the estimated indirect effect matrices are consistent and element-wise asymptotically normal, and we derive error bounds for the estimators. To evaluate the efficacy of the MMM mediation framework, we first investigate its finite-sample performance, including convergence properties, the behavior of the asymptotic approximations, and robustness to noise, via simulation studies. We then apply MMM mediation analysis to data from the Alzheimer's Disease Neuroimaging Initiative to study how cortical thickness of 202 brain regions may mediate the effects of 688 genome-wide significant single nucleotide polymorphisms (SNPs) (selected from approximately 1.5 million SNPs) on eleven cognitive-behavioral and diagnostic outcomes. The MMM mediation framework identifies biologically interpretable, many-to-many-to-many genetic-neural-cognitive pathways and improves downstream out-of-sample classification and prediction performance. Taken together, our results demonstrate the potential of MMM mediation analysis and highlight the value of statistical methodology for investigating complex, high-dimensional multi-layer pathways in science. The MMM package is available at https://github.com/THELabTop/MMM-Mediation.
翻译:我们研究了高维中介效应分析问题,其中暴露变量、中介变量和结果变量均为多元结构,且暴露变量与中介变量可能具有高维特征。我们将此形式化为多(暴露)对多(中介)对多(结果)的MMM中介效应分析问题。在方法论上,MMM中介分析同时实现高维暴露变量与中介变量的变量选择、间接效应矩阵(即连接暴露-中介路径与中介-结果路径的系数矩阵)的估计,并支持多元结果变量的预测。在理论层面,我们证明了估计得到的间接效应矩阵具有一致性且元素渐近正态分布,同时推导了估计量的误差界。为评估MMM中介框架的有效性,我们首先通过模拟研究考察其有限样本性能,包括收敛特性、渐近逼近行为及对噪声的稳健性。随后将MMM中介分析应用于阿尔茨海默病神经影像学计划的数据,探究202个脑区的皮层厚度如何中介688个全基因组显著单核苷酸多态性(SNP)(选自约150万个SNP)对11项认知行为与诊断结果的影响。MMM中介框架识别出具有生物学可解释性的多对多对多遗传-神经-认知通路,并提升了下游样本外分类与预测性能。综上,我们的研究结果展示了MMM中介分析的潜力,并凸显了统计方法学在探究科学中复杂高维多层级通路中的价值。MMM软件包可从https://github.com/THELabTop/MMM-Mediation获取。