Most existing causal discovery methods rely on the assumption of no latent confounders, limiting their applicability in solving real-life problems. In this paper, we introduce a novel, versatile framework for causal discovery that accommodates the presence of causally-related hidden variables almost everywhere in the causal network (for instance, they can be effects of observed variables), based on rank information of covariance matrix over observed variables. We start by investigating the efficacy of rank in comparison to conditional independence and, theoretically, establish necessary and sufficient conditions for the identifiability of certain latent structural patterns. Furthermore, we develop a Rank-based Latent Causal Discovery algorithm, RLCD, that can efficiently locate hidden variables, determine their cardinalities, and discover the entire causal structure over both measured and hidden ones. We also show that, under certain graphical conditions, RLCD correctly identifies the Markov Equivalence Class of the whole latent causal graph asymptotically. Experimental results on both synthetic and real-world personality data sets demonstrate the efficacy of the proposed approach in finite-sample cases.
翻译:大多数现有的因果发现方法依赖于无潜在混杂变量的假设,这限制了它们在解决实际问题中的适用性。本文提出了一种新颖且多用途的因果发现框架,该框架基于观测变量协方差矩阵的秩信息,允许因果网络中几乎所有位置存在因果相关的隐藏变量(例如,它们可以是被观测变量的效应)。我们首先探究了秩相较于条件独立性的有效性,并在理论上建立了识别特定潜在结构模式的必要充分条件。此外,我们开发了一种基于秩的潜在因果发现算法RLCD,该算法能够高效定位隐藏变量、确定其基数,并发现包含所有已测量和隐藏变量的完整因果结构。我们还证明,在某些图条件下,RLCD渐近地正确识别了整个潜在因果图的马尔可夫等价类。在合成数据集和真实人格数据集上的实验结果,展示了所提方法在有限样本情况下的有效性。