Data-driven causal relationship identification is pertinent to advancing understanding of complex systems both within and beyond science. Bayesian networks offer a probabilistic method for modelling generic causal relationships via directed acyclic graphs (DAGs). However, typical techniques for constructing Bayesian networks rely on optimization, which can be ill-suited for learning causal relationships because the underlying data may admit multiple chains of causation. More data-faithful representations of causal relationships would provide frameworks for constructing multiple causal maps that are consistent with the variability that is inherent in underlying data. Here, we show that entropy-based inference generates atlases of plausible causal relationships that are consistent with underlying data. On simulated noisy data of 2- and 20-node linear structural equation models, we sample a maximum-entropy ensemble of graphs that allow us to quantify the inherent structural ambiguity in underlying causal relationships. Our method shows that "optimized" DAGs can contain causal artifacts are not consistent across equivalently accurate topologies.
翻译:数据驱动的因果识别对于深化理解科学内外的复杂系统至关重要。贝叶斯网络通过有向无环图(DAGs)为建模通用因果关系提供了概率方法。然而,构建贝叶斯网络的常规技术依赖优化,这种方法可能不适用于学习因果关系,因为底层数据可能存在多条因果链。更忠实于数据的因果关系表示应提供框架,用于构建与底层数据固有变异性一致的多重因果图谱。本文表明,基于熵的推理可生成与底层数据一致的合理因果关系图谱。在2节点和20节点线性结构方程模型的模拟噪声数据上,我们采样了最大熵图集合,从而能够量化底层因果关系中固有的结构模糊性。我们的方法显示,"最优"DAG可能包含因果假象,这些假象在精度等价的拓扑结构中并不一致。