Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
翻译:因果推断是科学的核心目标之一,使研究者能够利用观测数据对假设性干预的预测得出有意义的结论。路径模型、结构方程模型(SEM)以及更广义的有向无环图(DAG)为明确指定现象背后因果结构的假设提供了方法。不同于对函数形式与参数形式几乎不做假设的DAG,SEM假设线性关系。这种函数形式误设可能导致研究者无法进行可靠的效果量估计。为此,我们提出超级学习器方程建模——一种集成机器学习超级学习器集成方法的路径建模技术。实验表明,该方法能提供一致且无偏的因果效应估计,在线性模型中与SEM相比具有竞争力,并在处理非线性关系时显著优于SEM。我们提供开源代码及附带示例用法的教程笔记本,凸显该方法易于使用的特性。