Causal discovery is fundamental to scientific research, yet traditional statistical algorithms face significant challenges, including expensive data collection, redundant computation for known relations, and unrealistic assumptions. While recent LLM-based methods excel at identifying commonly known causal relations, they fail to uncover novel relations. We introduce IRIS (Iterative Retrieval and Integrated System for Real-Time Causal Discovery), a novel framework that addresses these limitations. Starting with a set of initial variables, IRIS automatically collects relevant documents, extracts variables, and uncovers causal relations. Our hybrid causal discovery method combines statistical algorithms and LLM-based methods to discover known and novel causal relations. In addition to causal discovery on initial variables, the missing variable proposal component of IRIS identifies and incorporates missing variables to expand the causal graphs. Our approach enables real-time causal discovery from only a set of initial variables without requiring pre-existing datasets.
翻译:因果发现是科学研究的基础,然而传统统计算法面临重大挑战,包括昂贵的数据收集、对已知关系的冗余计算以及不现实的假设。虽然近期基于大语言模型的方法在识别常见已知因果关系方面表现出色,却无法发现新颖关系。本文提出IRIS(实时因果发现的迭代检索与集成系统),这一新颖框架旨在解决上述局限。IRIS从初始变量集合出发,自动收集相关文献、提取变量并揭示因果关系。我们提出的混合因果发现方法结合了统计算法与基于大语言模型的方法,能够同时发现已知与新颖的因果关系。除对初始变量进行因果发现外,IRIS的缺失变量提议组件还能识别并整合缺失变量以扩展因果图。本方法仅需初始变量集合即可实现实时因果发现,无需依赖现有数据集。