We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic models. Roughly speaking, GIN implies the existence of a set $\mathcal{S}$ such that $\mathcal{S}$ is causally earlier (w.r.t. the causal ordering) than $\mathbf{Y}$, and that every active (collider-free) path between $\mathbf{Y}$ and $\mathbf{Z}$ must contain a node from $\mathcal{S}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the causal structure of a LiNGLaH is identifiable in light of GIN conditions. Experimental results show the effectiveness of the proposed method.
翻译:本研究探讨了在存在隐变量情况下学习因果结构的任务,包括定位隐变量并确定其数量,以及识别隐变量与观测变量之间的因果关系。为此,我们针对包含隐变量的线性非高斯无环因果模型,提出了广义独立噪声条件。该条件建立了某些测量变量的线性组合与其他测量变量之间的独立性。具体而言,对于两个观测随机向量$\bf{Y}$和$\bf{Z}$,当且仅当$\omega^{\intercal}\mathbf{Y}$与$\mathbf{Z}$独立时GIN成立,其中$\omega$是由$\mathbf{Y}$和$\mathbf{Z}$的互协方差确定的非零参数向量。随后,我们给出了线性非高斯无环模型中GIN条件的充分必要图准则。简而言之,GIN意味着存在集合$\mathcal{S}$,使得$\mathcal{S}$在因果序上早于$\mathbf{Y}$,且$\mathbf{Y}$与$\mathbf{Z}$之间的所有活跃(无碰撞点)路径必须包含$\mathcal{S}$中的节点。有趣的是,我们发现独立噪声条件(即若无混杂因子,则原因与从结果对原因回归得到的残差独立)可视为GIN的特例。基于GIN与隐变量因果结构之间的这种关联,我们进一步利用所提出的GIN条件,结合精心设计的搜索流程,高效估计线性非高斯隐变量层次模型。在该类模型中,隐混杂因子可能具有因果关联,甚至可能遵循层次结构。我们证明LiNGLaH的因果结构在GIN条件下是可识别的。实验结果验证了所提方法的有效性。