Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.
翻译:理解变量间的因果关系是各科学领域的基础。大多数因果发现算法依赖于两个关键假设:(i) 所有变量均可观测,(ii) 底层因果图是无环的。尽管这些假设简化了理论分析,但在现实世界系统(如生物网络)中它们常常被违背。现有的考虑混杂因子的方法要么假设线性关系,要么难以扩展。为了应对这些局限性,我们提出了DCCD-CONF,这是一个新颖的框架,可利用干预数据在存在未测量混杂因子的情况下,对非线性循环因果图进行可微分学习。我们的方法通过最大化数据的对数似然,交替优化图结构和估计混杂因子分布。通过在合成数据和真实世界基因扰动数据集上的实验,我们表明DCCD-CONF在因果图恢复和混杂因子识别两方面均优于现有最先进的方法。此外,我们还为我们的框架提供了一致性保证,从而加强了其理论上的稳健性。