Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
翻译:因果推断是科学的重要目标,使研究者能够利用观测数据得出关于假设干预预测的有意义结论。路径模型、结构方程模型(SEM)以及更广义的有向无环图(DAG)为明确指定现象背后因果结构的假设提供了手段。与对函数形式和参数形式假设极少的DAG不同,SEM假定线性关系,这可能导致函数形式误设,阻碍研究者开展可靠效应量估计。为此,我们提出超级学习器方程建模——一种融合机器学习超级学习器集成技术的路径建模方法。实验证明该方法能提供因果效应的一致无偏估计,在线性模型上与SEM表现相当,并在处理非线性关系时显著优于SEM。我们提供开源代码及附带示例用法的教程笔记本,突出该方法易于使用的特性。