Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have equal variances across variables, observations, or even both. The noises in real data usually violate both assumptions due to the biases introduced by different data collection processes. To address the issue of heteroscedastic noise, we introduce relaxed and implementable sufficient conditions, proving the identifiability of a general class of SEM subject to these conditions. Based on the identifiable general SEM, we propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties and to learn a causal DAG from data with heteroscedastic variable noise under varying variance. We show significant empirical gains of the proposed approaches over state-of-the-art methods on both synthetic data and real data.
翻译:捕捉由有向无环图(DAGs)表示的底层结构因果关联,一直是人工智能各学科中的基本任务。通过连续优化框架进行的因果DAG学习,近年来在准确性和效率方面均取得了令人瞩目的成效。然而,大多数方法都强假设了同方差噪声,即外生噪声在各变量间、各观测间乃至两者间具有相等的方差。由于不同数据收集过程引入的偏差,真实数据中的噪声通常违背这两类假设。为解决异方差噪声问题,我们引入了可松弛且可实现的充分条件,证明了一类满足这些条件的广义结构方程模型(SEM)的可识别性。基于可识别的广义SEM,我们提出了一种新的DAG学习公式,该公式能够刻画噪声方差在变量和观测间变化的情况。随后,我们提出了一种有效的两阶段迭代DAG学习算法,以应对日益增加的优化难度,并从具有异方差变量噪声(方差可变)的数据中学习因果DAG。实验表明,在合成数据和真实数据上,所提方法相比现有最先进技术均取得了显著的性能提升。