Large Language Models (LLMs) exhibit exceptional abilities for causal analysis between concepts in numerous societally impactful domains, including medicine, science, and law. Recent research on LLM performance in various causal discovery and inference tasks has given rise to a new ladder in the classical three-stage framework of causality. In this paper, we advance the current research of LLM-driven causal discovery by proposing a novel framework that combines knowledge-based LLM causal analysis with data-driven causal structure learning. To make LLM more than a query tool and to leverage its power in discovering natural and new laws of causality, we integrate the valuable LLM expertise on existing causal mechanisms into statistical analysis of objective data to build a novel and practical baseline for causal structure learning. We introduce a universal set of prompts designed to extract causal graphs from given variables and assess the influence of LLM prior causality on recovering causal structures from data. We demonstrate the significant enhancement of LLM expertise on the quality of recovered causal structures from data, while also identifying critical challenges and issues, along with potential approaches to address them. As a pioneering study, this paper aims to emphasize the new frontier that LLMs are opening for classical causal discovery and inference, and to encourage the widespread adoption of LLM capabilities in data-driven causal analysis.
翻译:大型语言模型(LLMs)在医学、科学、法律等众多对社会影响深远的领域中,展现出在概念间进行因果分析的卓越能力。近期关于LLMs在各种因果发现与推断任务中表现的研究,已在经典三阶段因果框架中催生出一个新的层次。本文通过提出一种融合基于知识的LLM因果分析与数据驱动因果结构学习的新框架,推进了当前LLM驱动的因果发现研究。为使LLM不仅作为查询工具,并利用其在发现自然与新型因果规律方面的潜力,我们将LLM关于现有因果机制的宝贵专业知识融入客观数据的统计分析中,构建了一个新颖且实用的因果结构学习基线。我们引入了一套通用提示集,用于从给定变量中提取因果图,并评估LLM先验因果对从数据中恢复因果结构的影响。我们证明了LLM专业知识能显著提升从数据中恢复的因果结构质量,同时识别出关键挑战与问题,并提出了应对这些挑战的潜在方法。作为一项开创性研究,本文旨在强调LLM为经典因果发现与推断所开辟的新前沿,并推动LLM能力在数据驱动因果分析中的广泛采用。