Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
翻译:因果推断是科学的重要目标,使研究人员能够利用观测数据对假设性干预的预测得出有意义的结论。路径模型、结构方程模型(SEM),以及更广义的有向无环图(DAG),为明确指定现象背后的因果结构假设提供了手段。与对函数形式和参数形式假设极少的DAG不同,SEM假设线性关系,这可能导致函数形式误设,从而阻碍研究人员进行可靠的效果量估计。为此,我们提出超级学习器方程建模——一种集成机器学习超级学习器集成方法的路径建模技术。我们通过实验证明:该方法能够一致且无偏地估计因果效应;在线性模型场景下与SEM性能相当;在处理非线性关系时显著优于SEM。我们提供了开源代码和包含示例用法的教程文档,凸显该方法易于使用的特性。