Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data, especially when there is a feature-sample imbalance. Here, we show that the denoising score matching objective of diffusion models could smooth the gradients for faster, more stable convergence. We also propose an adaptive k-hop acyclicity constraint that improves runtime over existing solutions that require matrix inversion. We name this framework Denoising Diffusion Causal Discovery (DDCD). Unlike generative diffusion models, DDCD utilizes the reverse denoising process to infer a parameterized causal structure rather than to generate data. We demonstrate the competitive performance of DDCDs on synthetic benchmarking data. We also show that our methods are practically useful by conducting qualitative analyses on two real-world examples. Code is available at this url: https://github.com/haozhu233/ddcd.
翻译:理解观测数据中的因果依赖关系对于指导决策至关重要。这些关系通常被建模为贝叶斯网络和有向无环图。现有方法(如NOTEARS和DAG-GNN)在高维数据中常面临可扩展性和稳定性问题,尤其是在特征-样本不平衡的情况下。本文证明,扩散模型的去噪分数匹配目标能够平滑梯度,从而实现更快、更稳定的收敛。我们同时提出一种自适应k跳无环性约束,相较于需要矩阵求逆的现有方案,该约束可提升运行效率。我们将该框架命名为去噪扩散因果发现。与生成式扩散模型不同,DDCD利用反向去噪过程推断参数化因果结构,而非生成数据。我们在合成基准数据集上展示了DDCD的竞争性性能,并通过两个真实世界案例的定性分析证明了其实际应用价值。代码可访问以下网址获取:https://github.com/haozhu233/ddcd。