We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modeling approach, we specify GSEMs by a copula joint distribution of outcome variable, mediator and exposure variable, in which marginal distributions are built upon generalized linear models (GLMs) with confounding factors. We discuss the identifiability conditions for the causal mediation effects in the counterfactual paradigm as well as the issue of mediation leakage, and develop an asymptotically efficient profile maximum likelihood estimation and inference for two key mediation estimands, natural direct effect and natural indirect effect, in different scenarios of mixed data types. The proposed new methodology is illustrated by a motivating epidemiological study that aims to investigate whether the tempo of reaching infancy BMI peak (delay or on time), an important early life growth milestone, may mediate the association between prenatal exposure to phthalates and pubertal health outcomes.
翻译:本文提出了一类统一的结构方程模型(GSEMs),用于处理中介分析中的混合数据类型,包括连续变量、分类变量和计数变量。此类模型将经典的线性结构方程模型大幅扩展,以适应中介分析应用中产生的多种数据类型。采用分层建模方法,我们通过结果变量、中介变量和暴露变量的联合Copula分布来指定GSEMs,其中边际分布基于包含混杂因素的广义线性模型(GLMs)构建。本文讨论了反事实范式下因果中介效应的可识别性条件及中介泄漏问题,并针对不同混合数据类型场景中的两个关键中介估计量——自然直接效应和自然间接效应——开发了渐近有效的剖面极大似然估计与推断方法。所提出的新方法论通过一项启发性流行病学研究加以阐释,该研究旨在探究婴儿期BMI峰值到达节奏(延迟或准时)这一重要早期生命生长里程碑是否中介了产前邻苯二甲酸酯暴露与青春期健康结局之间的关联。