We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modeling approach, we specify GSEMs by a copula joint distribution of outcome variable, mediator and exposure variable, in which marginal distributions are built upon generalized linear models (GLMs) with confounding factors. We discuss the identifiability conditions for the causal mediation effects in the counterfactual paradigm as well as the issue of mediation leakage, and develop an asymptotically efficient profile maximum likelihood estimation and inference for two key mediation estimands, natural direct effect and natural indirect effect, in different scenarios of mixed data types. The proposed new methodology is illustrated by a motivating epidemiological study that aims to investigate whether the tempo of reaching infancy BMI peak (delay or on time), an important early life growth milestone, may mediate the association between prenatal exposure to phthalates and pubertal health outcomes.
翻译:我们提出了中介分析中处理混合数据类型(包括连续、分类及计数变量)的统一广义结构方程模型(GSEMs)类。该类模型将经典线性结构方程模型大幅扩展,以适应中介分析应用中产生的多种数据类型。通过采用分层建模方法,我们利用结果变量、中介变量与暴露变量的连接函数联合分布来定义GSEMs,其中边际分布基于含混杂因素的广义线性模型(GLMs)构建。基于反事实因果框架,我们讨论了因果中介效应的可识别性条件及中介泄露问题,并针对混合数据类型的不同场景,为两个关键中介估计量——自然直接效应与自然间接效应——开发了渐近有效的剖面最大似然估计与推断方法。通过一项旨在探讨婴儿期BMI峰值达到时间(延迟或准时)这一重要早期成长里程碑是否可能中介产前邻苯二甲酸酯暴露与青春期健康结局关联的流行病学研究,验证了所提新方法的实用性。