Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests. They often suffer from data collection biases and limitations of individuals' knowledge. The advance of large language models (LLMs) provides opportunities to address these problems. We propose a novel method that leverages LLMs to deduce causal relationships in general causal graph recovery tasks. This method leverages knowledge compressed in LLMs and knowledge LLMs extracted from scientific publication database as well as experiment data about factors of interest to achieve this goal. Our method gives a prompting strategy to extract associational relationships among those factors and a mechanism to perform causality verification for these associations. Comparing to other LLM-based methods that directly instruct LLMs to do the highly complex causal reasoning, our method shows clear advantage on causal graph quality on benchmark datasets. More importantly, as causality among some factors may change as new research results emerge, our method show sensitivity to new evidence in the literature and can provide useful information for updating causal graphs accordingly.
翻译:传统的因果图恢复通常采用基于统计估计的方法,或依赖于研究者对相关变量的个人知识。这些方法常受限于数据收集偏差和个人知识的局限性。大型语言模型(LLMs)的发展为解决这些问题提供了新的机遇。本文提出一种新颖方法,利用LLMs在通用因果图恢复任务中推断因果关系。该方法通过整合LLMs中压缩的知识、LLMs从科学文献数据库中提取的知识,以及相关因素的实验数据来实现这一目标。我们设计了一种提示策略来提取这些因素之间的关联关系,并建立了一种机制对这些关联进行因果性验证。相较于其他直接指令LLMs执行高度复杂因果推理的方法,本方法在基准数据集上展现出更优的因果图质量。更重要的是,由于某些因素间的因果关系可能随着新研究成果的出现而改变,本方法能够敏锐感知文献中的新证据,并为相应更新因果图提供有效信息。