Using GPT-4 to guide causal machine learning

Since its introduction to the public, ChatGPT has had an unprecedented impact. While some experts praised AI advancements and highlighted their potential risks, others have been critical about the accuracy and usefulness of Large Language Models (LLMs). In this paper, we are interested in the ability of LLMs to identify causal relationships. We focus on the well-established GPT-4 (Turbo) and evaluate its performance under the most restrictive conditions, by isolating its ability to infer causal relationships based solely on the variable labels without being given any context, demonstrating the minimum level of effectiveness one can expect when it is provided with label-only information. We show that questionnaire participants judge the GPT-4 graphs as the most accurate in the evaluated categories, closely followed by knowledge graphs constructed by domain experts, with causal Machine Learning (ML) far behind. We use these results to highlight the important limitation of causal ML, which often produces causal graphs that violate common sense, affecting trust in them. However, we show that pairing GPT-4 with causal ML overcomes this limitation, resulting in graphical structures learnt from real data that align more closely with those identified by domain experts, compared to structures learnt by causal ML alone. Overall, our findings suggest that despite GPT-4 not being explicitly designed to reason causally, it can still be a valuable tool for causal representation, as it improves the causal discovery process of causal ML algorithms that are designed to do just that.

翻译：自ChatGPT向公众发布以来，其影响力可谓前所未有。尽管部分专家对人工智能的进步表示赞赏并强调其潜在风险，但也有学者对大型语言模型（LLMs）的准确性与实用性持批判态度。本文聚焦于LLMs识别因果关系的能力。我们以成熟的GPT-4（Turbo）为研究对象，在最严格的条件下评估其表现：通过隔离模型仅依据变量标签（不提供任何上下文）推断因果关系的能力，揭示其在仅获得标签信息时可预期的最低效能水平。实验表明，问卷参与者认为GPT-4生成的因果图在评估类别中最为准确，紧随其后的是领域专家构建的知识图谱，而因果机器学习（ML）方法的表现则远落后于两者。我们借此结果指出因果ML的重要局限——其生成的因果图常违背常识，影响了可信度。然而，研究发现将GPT-4与因果ML结合可克服此局限：相较于单独使用因果ML，结合方法从真实数据中学习得到的图结构更接近领域专家识别的结果。总体而言，我们的研究表明，尽管GPT-4并非为因果推理而设计，但其仍能成为因果表征的有价值工具，因为它能提升专门用于因果发现的因果ML算法的学习效能。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日