Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
翻译:因果推断是科学的核心目标,使研究者能够利用观测数据对假设干预的预测得出有意义的结论。路径模型、结构方程模型(SEM)以及更广义的有向无环图(DAG)为明确阐述现象背后因果结构的假设提供了手段。与对函数形式和参数形式假设极少的DAG不同,SEM假设线性关系,这可能导致函数形式设定错误,从而阻碍研究者进行可靠的效果量估计。为此,我们提出超级学习器方程建模——一种整合机器学习超级学习器集成的路径建模技术。实验证明,该方法能够提供一致且无偏的因果效应估计,在线性模型中与SEM相比具有竞争性表现,并在处理非线性关系时显著优于SEM。我们提供开源代码及附带示例用法的教程笔记,凸显该方法的易用性。