Causal discovery seeks to uncover causal relations from data, typically represented as causal graphs, and is essential for predicting the effects of interventions. While expert knowledge is required to construct principled causal graphs, many statistical methods have been proposed to leverage observational data with varying formal guarantees. Causal Assumption-based Argumentation (ABA) is a framework that uses symbolic reasoning to ensure correspondence between input constraints and output graphs, while offering a principled way to combine data and expertise. We explore the use of large language models (LLMs) as imperfect experts for Causal ABA, eliciting semantic structural priors from variable names and descriptions and integrating them with conditional-independence evidence. Experiments on standard benchmarks and semantically grounded synthetic graphs demonstrate state-of-the-art performance, and we additionally introduce an evaluation protocol to mitigate memorisation bias when assessing LLMs for causal discovery.
翻译:因果发现旨在从数据中揭示因果关系,通常以因果图形式表示,对于预测干预效果至关重要。虽然构建具有原则性的因果图需要专家知识,但已有许多统计方法被提出以利用具有不同形式保证的观测数据。基于因果假设的论证(ABA)框架通过符号推理确保输入约束与输出图之间的对应关系,同时为结合数据与专业知识提供了原则性方法。本文探索将大型语言模型(LLMs)作为Causal ABA框架中的非完美专家使用,从变量名称与描述中提取语义结构先验,并将其与条件独立性证据相结合。在标准基准测试和基于语义的合成图上的实验展示了最先进的性能,此外我们提出了一种评估方案,以减轻在评估LLMs用于因果发现时的记忆偏差。