Unveiling Causal Mediation Pathways in High-Dimensional Mixed Exposures: A Data-Adaptive Target Parameter Strategy

Mediation analysis in causal inference typically concentrates on one binary exposure, using deterministic interventions to split the average treatment effect into direct and indirect effects through a single mediator. Yet, real-world exposure scenarios often involve multiple continuous exposures impacting health outcomes through varied mediation pathways, which remain unknown a priori. Addressing this complexity, we introduce NOVAPathways, a methodological framework that identifies exposure-mediation pathways and yields unbiased estimates of direct and indirect effects when intervening on these pathways. By pairing data-adaptive target parameters with stochastic interventions, we offer a semi-parametric approach for estimating causal effects in the context of high-dimensional, continuous, binary, and categorical exposures and mediators. In our proposed cross-validation procedure, we apply sequential semi-parametric regressions to a parameter-generating fold of the data, discovering exposure-mediation pathways. We then use stochastic interventions on these pathways in an estimation fold of the data to construct efficient estimators of natural direct and indirect effects using flexible machine learning techniques. Our estimator proves to be asymptotically linear under conditions necessitating n to the negative quarter consistency of nuisance function estimation. Simulation studies demonstrate the square root n consistency of our estimator when the exposure is quantized, whereas for truly continuous data, approximations in numerical integration prevent square root n consistency. Our NOVAPathways framework, part of the open-source SuperNOVA package in R, makes our proposed methodology for high-dimensional mediation analysis available to researchers, paving the way for the application of modified exposure policies which can delivery more informative statistical results for public policy.

翻译：因果推断中的中介分析通常关注单一二元暴露，通过确定性干预将平均处理效应分解为通过单一中介变量的直接效应和间接效应。然而，现实暴露场景往往涉及多种连续暴露，通过未知的多种中介途径影响健康结局。为应对这一复杂性，我们提出NOVAPathways方法框架，该框架可识别暴露-中介通路，并在干预这些通路时提供直接效应和间接效应的无偏估计。通过将数据自适应目标参数与随机干预相结合，我们提出了一种半参数方法，用于在高维连续、二元和分类暴露及中介变量的背景下估计因果效应。在我们提出的交叉验证流程中，我们对数据的一个参数生成折应用序贯半参数回归，发现暴露-中介通路。随后，我们在数据的一个估计折中针对这些通路使用随机干预，利用灵活的机器学习技术构建自然直接效应和间接效应的有效估计量。我们的估计量被证明在需要干扰函数估计达到n的负四分之一次方一致性的条件下具有渐近线性性质。模拟研究表明，当暴露被量化时，估计量具有根号n一致性，而对于真正连续数据，数值积分中的近似导致无法实现根号n一致性。我们的NOVAPathways框架作为R语言开源SuperNOVA包的一部分，为研究人员提供了我们提出的高维中介分析方法，为应用改良暴露策略铺平道路，从而为公共政策提供更具统计信息意义的结果。