Since its introduction to the public, ChatGPT has had an unprecedented impact. While some experts praised AI advancements and highlighted their potential risks, others have been critical about the accuracy and usefulness of Large Language Models (LLMs). In this paper, we are interested in the ability of LLMs to identify causal relationships. We focus on the well-established GPT-4 (Turbo) and evaluate its performance under the most restrictive conditions, by isolating its ability to infer causal relationships based solely on the variable labels without being given any context, demonstrating the minimum level of effectiveness one can expect when it is provided with label-only information. We show that questionnaire participants judge the GPT-4 graphs as the most accurate in the evaluated categories, closely followed by knowledge graphs constructed by domain experts, with causal Machine Learning (ML) far behind. We use these results to highlight the important limitation of causal ML, which often produces causal graphs that violate common sense, affecting trust in them. However, we show that pairing GPT-4 with causal ML overcomes this limitation, resulting in graphical structures learnt from real data that align more closely with those identified by domain experts, compared to structures learnt by causal ML alone. Overall, our findings suggest that despite GPT-4 not being explicitly designed to reason causally, it can still be a valuable tool for causal representation, as it improves the causal discovery process of causal ML algorithms that are designed to do just that.
翻译:自ChatGPT向公众发布以来,其影响力可谓前所未有。尽管部分专家对人工智能的进步表示赞赏并强调其潜在风险,但也有学者对大型语言模型(LLMs)的准确性与实用性持批判态度。本文聚焦于LLMs识别因果关系的能力。我们以成熟的GPT-4(Turbo)为研究对象,在最严格的条件下评估其表现:通过隔离模型仅依据变量标签(不提供任何上下文)推断因果关系的能力,揭示其在仅获得标签信息时可预期的最低效能水平。实验表明,问卷参与者认为GPT-4生成的因果图在评估类别中最为准确,紧随其后的是领域专家构建的知识图谱,而因果机器学习(ML)方法的表现则远落后于两者。我们借此结果指出因果ML的重要局限——其生成的因果图常违背常识,影响了可信度。然而,研究发现将GPT-4与因果ML结合可克服此局限:相较于单独使用因果ML,结合方法从真实数据中学习得到的图结构更接近领域专家识别的结果。总体而言,我们的研究表明,尽管GPT-4并非为因果推理而设计,但其仍能成为因果表征的有价值工具,因为它能提升专门用于因果发现的因果ML算法的学习效能。