Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
翻译:因果推断是科学的核心目标,使研究者能够利用观测数据,针对假设干预的预测得出有意义的结论。路径模型、结构方程模型(SEM)以及更广义的有向无环图(DAG)为明确指定现象背后因果结构的假设提供了工具。与对函数形式和参数形式假设极少的DAG不同,SEM假设线性关系,这可能导致函数设定错误,妨碍研究者进行可靠的效应量估计。为此,我们提出超级学习器方程建模——一种整合机器学习超级学习器集成方法的路径建模技术。实验证明,该方法能够提供一致且无偏的因果效应估计,在线性模型中与SEM相比具有竞争力,并在处理非线性关系时显著优于SEM。我们提供了开源代码及附带使用示例的教程笔记本,凸显该方法易于使用的特性。