Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have equal variances across variables, observations, or even both. The noises in real data usually violate both assumptions due to the biases introduced by different data collection processes. To address the issue of heteroscedastic noise, we introduce relaxed and implementable sufficient conditions, proving the identifiability of a general class of SEM subject to these conditions. Based on the identifiable general SEM, we propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties and to learn a causal DAG from data with heteroscedastic variable noise under varying variance. We show significant empirical gains of the proposed approaches over state-of-the-art methods on both synthetic data and real data.
翻译:捕捉由有向无环图所表示的基础结构因果关系,一直是人工智能各领域的一项基本任务。通过连续优化框架进行因果DAG学习,近年来在准确性和效率方面均取得了令人瞩目的性能。然而,大多数方法都做出了同方差噪声的强假设,即外生噪声在变量间、观测间甚至两者间具有相等的方差。由于不同数据收集过程引入的偏差,真实数据中的噪声通常会同时违背这两个假设。为了解决异方差噪声问题,我们引入了宽松且可实现的充分条件,证明了满足这些条件的一类广义结构方程模型的可识别性。基于此可识别的广义SEM,我们提出了一种新的DAG学习公式,该公式考虑了噪声方差在变量间和观测间的变化。随后,我们提出了一种有效的两阶段迭代DAG学习算法,以应对日益增加的优化困难,并从具有时变方差的异方差变量噪声数据中学习因果DAG。我们在合成数据和真实数据上的实验表明,所提出的方法相较于最先进的方法取得了显著的性能提升。