We consider linear non-Gaussian structural equation models that involve latent confounding. In this setting, the causal structure is identifiable, but, in general, it is not possible to identify the specific causal effects. Instead, a finite number of different causal effects result in the same observational distribution. Most existing algorithms for identifying these causal effects use overcomplete independent component analysis (ICA), which often suffers from convergence to local optima. Furthermore, the number of latent variables must be known a priori. To address these issues, we propose an algorithm that operates recursively rather than using overcomplete ICA. The algorithm first infers a source, estimates the effect of the source and its latent parents on their descendants, and then eliminates their influence from the data. For both source identification and effect size estimation, we use rank conditions on matrices formed from higher-order cumulants. We prove asymptotic correctness under the mild assumption that locally, the number of latent variables never exceeds the number of observed variables. Simulation studies demonstrate that our method achieves comparable performance to overcomplete ICA even though it does not know the number of latents in advance.
翻译:本文研究涉及潜在混杂的线性非高斯结构方程模型。在此设定下,因果结构是可识别的,但通常无法确定具体的因果效应。实际上,存在有限个不同的因果效应会导致相同的观测分布。现有识别这些因果效应的算法大多采用过完备独立成分分析(ICA),但该方法常陷入局部最优解,且必须预先已知潜在变量的数量。为解决这些问题,我们提出一种递归运算算法以替代过完备ICA。该算法首先推断一个源变量,估计该源变量及其潜在父节点对后代变量的影响,随后从数据中消除这些影响。在源变量识别和效应大小估计中,我们采用基于高阶累积量构建的矩阵秩条件。我们在局部潜在变量数量不超过观测变量数量的温和假设下证明了算法的渐近正确性。仿真研究表明,即使不预先知晓潜在变量数量,本方法仍能达到与过完备ICA相当的性能。