Recursive linear structural equation models and the associated directed acyclic graphs (DAGs) play an important role in causal discovery. The classic identifiability result for this class of models states that when only observational data is available, each DAG can be identified only up to a Markov equivalence class. In contrast, recent work has shown that the DAG can be uniquely identified if the errors in the model are homoscedastic, i.e., all have the same variance. This equal variance assumption yields methods that, if appropriate, are highly scalable and also sheds light on fundamental information-theoretic limits and optimality in causal discovery. In this paper, we fill the gap that exists between the two previously considered cases, which assume the error variances to be either arbitrary or all equal. Specifically, we formulate a framework of partial homoscedasticity, in which the variables are partitioned into blocks and each block shares the same error variance. For any such groupwise equal variances assumption, we characterize when two DAGs give rise to identical Gaussian linear structural equation models. Furthermore, we show how the resulting distributional equivalence classes may be represented using a completed partially directed acyclic graph (CPDAG), and we give an algorithm to efficiently construct this CPDAG. In a simulation study, we demonstrate that greedy search provides an effective way to learn the CPDAG and exploit partial knowledge about homoscedasticity of errors in structural equation models.
翻译:递归线性结构方程模型及其对应的有向无环图(DAG)在因果发现中扮演着重要角色。该类模型的经典可识别性结果表明,在仅可获得观测数据的情况下,每个DAG只能被识别到马尔可夫等价类。相反,近期研究显示,若模型中的误差项是同方差的(即所有误差具有相同方差),则DAG可被唯一识别。这一等方差假设催生了高度可扩展的方法,并揭示了因果发现中信息论极限与最优性。本文填补了先前考虑的两种情形(即误差方差可任意取值或全部相等)之间的空白。具体而言,我们提出了一个偏同方差性框架,其中变量被划分为若干块,每块内的变量共享相同的误差方差。针对此类组内等方差假设,我们刻画了两个DAG产生相同高斯线性结构方程模型的条件。进一步,我们展示了如何用完全部分有向无环图(CPDAG)来表示由此产生的分布等价类,并给出了一种高效构建该CPDAG的算法。仿真研究表明,贪心搜索是学习CPDAG并利用结构方程模型中误差同方差性部分知识的有效方法。