We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modeling approach, we specify GSEMs by a copula joint distribution of outcome variable, mediator and exposure variable, in which marginal distributions are built upon generalized linear models (GLMs) with confounding factors. We discuss the identifiability conditions for the causal mediation effects in the counterfactual paradigm as well as the issue of mediation leakage, and develop an asymptotically efficient profile maximum likelihood estimation and inference for two key mediation estimands, natural direct effect and natural indirect effect, in different scenarios of mixed data types. The proposed new methodology is illustrated by a motivating epidemiological study that aims to investigate whether the tempo of reaching infancy BMI peak (delay or on time), an important early life growth milestone, may mediate the association between prenatal exposure to phthalates and pubertal health outcomes.
翻译:我们提出了一类统一化的广义结构方程模型(GSEMs),用于处理中介分析中混合类型数据(包括连续变量、分类变量和计数变量)的分析。该模型显著扩展了经典线性结构方程模型,可适应中介分析应用中产生的多种数据类型。采用层次建模方法,我们通过结果变量、中介变量和暴露变量的联合分布(基于copula函数进行建模,其中边际分布采用包含混杂因素的广义线性模型(GLMs))来刻画GSEMs。在反事实框架下,我们讨论了因果中介效应的可识别性条件以及中介泄漏问题,并针对混合数据类型的不同场景,开发了渐近有效的轮廓极大似然估计与推断方法,以估计两个关键中介效应指标——自然直接效应和自然间接效应。所提出的新方法通过一项动机性流行病学研究进行验证,该研究旨在探究婴儿期BMI峰值达到速度(延迟或准时)这一重要的早期生命生长里程碑,是否中介了产前邻苯二甲酸酯暴露与青春期健康结果之间的关联。