One of the established approaches to causal discovery consists of combining directed acyclic graphs (DAGs) with structural causal models (SCMs) to describe the functional dependencies of effects on their causes. Possible identifiability of SCMs given data depends on assumptions made on the noise variables and the functional classes in the SCM. For instance, in the LiNGAM model, the functional class is restricted to linear functions and the disturbances have to be non-Gaussian. In this work, we propose TSLiNGAM, a new method for identifying the DAG of a causal model based on observational data. TSLiNGAM builds on DirectLiNGAM, a popular algorithm which uses simple OLS regression for identifying causal directions between variables. TSLiNGAM leverages the non-Gaussianity assumption of the error terms in the LiNGAM model to obtain more efficient and robust estimation of the causal structure. TSLiNGAM is justified theoretically and is studied empirically in an extensive simulation study. It performs significantly better on heavy-tailed and skewed data and demonstrates a high small-sample efficiency. In addition, TSLiNGAM also shows better robustness properties as it is more resilient to contamination.
翻译:因果发现的一种经典方法是将有向无环图(DAG)与结构因果模型(SCM)相结合,以描述结果对其原因的依赖关系。给定数据时,SCM的可识别性取决于对噪声变量和函数类所做的假设。例如,在LiNGAM模型中,函数类被限制为线性函数,且扰动项必须为非高斯分布。本文提出了一种基于观测数据识别因果模型DAG的新方法——TSLiNGAM。该方法基于流行的DirectLiNGAM算法,该算法通过普通最小二乘(OLS)回归识别变量间的因果方向。TSLiNGAM利用LiNGAM模型中误差项的非高斯性假设,实现了对因果结构更高效、更稳健的估计。TSLiNGAM的方法在理论上具有合理性,并通过广泛的仿真研究进行了实证分析。结果表明,该方法在处理重尾分布和有偏数据时表现显著更优,且具有较高的小样本效率。此外,TSLiNGAM展现出更好的鲁棒性,对数据污染具有更强的抗性。