Causal discovery is fundamental to scientific understanding and reliable decision-making. Existing approaches face critical limitations: purely data-driven methods suffer from statistical indistinguishability and modeling assumptions, while recent LLM-based methods either ignore statistical evidence or incorporate unverified priors that can mislead result. To this end, we propose CauScientist, a collaborative framework that synergizes LLMs as hypothesis-generating "data scientists" with probabilistic statistics as rigorous "verifiers". CauScientist employs hybrid initialization to select superior starting graphs, iteratively refines structures through LLM-proposed modifications validated by statistical criteria, and maintains error memory to guide efficient search space. Experiments demonstrate that CauScientist substantially outperforms purely data-driven baselines, achieving up to 53.8% F1 score improvement and enhancing recall from 35.0% to 100.0%. Notably, while standalone LLM performance degrades with graph complexity, CauScientist reduces structural hamming distance (SHD) by 44.0% compared to Qwen3-32B on 37-node graphs. Our project page is at https://github.com/OpenCausaLab/CauScientist.
翻译:因果发现是科学理解和可靠决策的基础。现有方法面临关键局限:纯数据驱动方法受限于统计不可区分性和建模假设,而近期基于大语言模型的方法要么忽略统计证据,要么引入未经证实的先验知识,从而可能误导结果。为此,我们提出CauScientist,一个协同框架,它将大语言模型作为生成假设的“数据科学家”与作为严格“验证者”的概率统计方法相结合。CauScientist采用混合初始化来选择更优的初始图,通过大语言模型提出修改建议并由统计标准验证,从而迭代优化结构,并维护错误记忆以引导高效的搜索空间。实验表明,CauScientist显著优于纯数据驱动的基线方法,实现了高达53.8%的F1分数提升,并将召回率从35.0%提高到100.0%。值得注意的是,虽然独立的大语言模型性能随图复杂度增加而下降,但CauScientist在37节点图上相比Qwen3-32B将结构汉明距离降低了44.0%。我们的项目页面位于 https://github.com/OpenCausaLab/CauScientist。