Causality is essential for understanding complex systems, such as the economy, the brain, and the climate. Constructing causal graphs often relies on either data-driven or expert-driven approaches, both fraught with challenges. The former methods, like the celebrated PC algorithm, face issues with data requirements and assumptions of causal sufficiency, while the latter demand substantial time and domain knowledge. This work explores the capabilities of Large Language Models (LLMs) as an alternative to domain experts for causal graph generation. We frame conditional independence queries as prompts to LLMs and employ the PC algorithm with the answers. The performance of the LLM-based conditional independence oracle on systems with known causal graphs shows a high degree of variability. We improve the performance through a proposed statistical-inspired voting schema that allows some control over false-positive and false-negative rates. Inspecting the chain-of-thought argumentation, we find causal reasoning to justify its answer to a probabilistic query. We show evidence that knowledge-based CIT could eventually become a complementary tool for data-driven causal discovery.
翻译:因果关系对于理解复杂系统(如经济、大脑和气候)至关重要。构建因果图通常依赖于数据驱动或专家驱动的方法,两者均面临诸多挑战。前者(如著名的PC算法)存在数据需求与因果充分性假设的问题,而后者则需要大量时间和领域知识。本研究探索了将大语言模型(LLMs)作为领域专家替代方案用于因果图生成的能力。我们将条件独立性查询构建为对大语言模型的提示,并利用其回答执行PC算法。基于大语言模型的条件独立性预言机在已知因果图系统上的性能表现出高度可变性。我们通过提出一种受统计学启发的投票机制来提升性能,该机制允许对假阳性率和假阴性率进行一定程度的控制。通过审视思维链论证过程,我们发现其运用因果推理来证明其对概率性查询的回答。我们提供的证据表明,基于知识的条件独立性检验最终可能成为数据驱动因果发现的补充工具。