We investigate the challenging task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To address this, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models. Roughly speaking, GIN implies the existence of an exogenous set $\mathcal{S}$ relative to the parent set of $\mathbf{Y}$ (w.r.t. the causal ordering), such that $\mathcal{S}$ d-separates $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the underlying causal structure of a LiNGLaH is identifiable in light of GIN conditions under mild assumptions. Experimental results show the effectiveness of the proposed approach.
翻译:我们研究了在存在潜变量情况下学习因果结构的挑战性任务,包括定位潜变量、确定其数量,以及识别潜变量和观测变量之间的因果关系。为此,我们针对包含潜变量的线性非高斯有环因果模型,提出了一种广义独立噪声(GIN)条件,该条件建立了某些测量变量线性组合与另一些测量变量之间的独立性。具体而言,对于两个观测随机向量 $\bf{Y}$ 和 $\bf{Z}$,当且仅当 $\omega^{\intercal}\mathbf{Y}$ 与 $\mathbf{Z}$ 独立时,GIN成立,其中 $\omega$ 是由 $\mathbf{Y}$ 和 $\mathbf{Z}$ 的互协方差确定的非零参数向量。随后,我们给出了线性非高斯有环因果模型中GIN条件的必要且充分的图论准则。大致而言,GIN意味着存在一个相对于 $\mathbf{Y}$ 的父集(按因果顺序)的外生集合 $\mathcal{S}$,使得 $\mathcal{S}$ 能够d-分离 $\mathbf{Y}$ 与 $\mathbf{Z}$。有趣的是,我们发现独立噪声条件(即若无混杂因子,则原因与从效果对原因回归得到的残差独立)可视为GIN的特例。基于GIN与潜因果结构之间的这种联系,我们进一步利用所提出的GIN条件,结合精心设计的搜索过程,高效估计线性非高斯潜层次模型(LiNGLaHs),其中潜混杂因子之间可能存在因果关系甚至遵循层次结构。我们证明,在温和假设下,基于GIN条件,LiNGLaH的底层因果结构是可辨识的。实验结果展示了所提方法的有效性。