Causality for Large Language Models

Recent breakthroughs in artificial intelligence have driven a paradigm shift, where large language models (LLMs) with billions or trillions of parameters are trained on vast datasets, achieving unprecedented success across a series of language tasks. However, despite these successes, LLMs still rely on probabilistic modeling, which often captures spurious correlations rooted in linguistic patterns and social stereotypes, rather than the true causal relationships between entities and events. This limitation renders LLMs vulnerable to issues such as demographic biases, social stereotypes, and LLM hallucinations. These challenges highlight the urgent need to integrate causality into LLMs, moving beyond correlation-driven paradigms to build more reliable and ethically aligned AI systems. While many existing surveys and studies focus on utilizing prompt engineering to activate LLMs for causal knowledge or developing benchmarks to assess their causal reasoning abilities, most of these efforts rely on human intervention to activate pre-trained models. How to embed causality into the training process of LLMs and build more general and intelligent models remains unexplored. Recent research highlights that LLMs function as causal parrots, capable of reciting causal knowledge without truly understanding or applying it. These prompt-based methods are still limited to human interventional improvements. This survey aims to address this gap by exploring how causality can enhance LLMs at every stage of their lifecycle-from token embedding learning and foundation model training to fine-tuning, alignment, inference, and evaluation-paving the way for more interpretable, reliable, and causally-informed models. Additionally, we further outline six promising future directions to advance LLM development, enhance their causal reasoning capabilities, and address the current limitations these models face.

翻译：人工智能领域的近期突破推动了范式转变，其中拥有数十亿或数万亿参数的大语言模型（LLMs）在庞大数据集上进行训练，在一系列语言任务中取得了前所未有的成功。然而，尽管取得了这些成就，LLMs仍然依赖于概率建模，这种方法通常捕捉到植根于语言模式和社会刻板印象的虚假相关性，而非实体与事件之间真实的因果关系。这一局限性使得LLMs容易受到诸如人口统计偏见、社会刻板印象以及LLM幻觉等问题的影响。这些挑战凸显了将因果性整合到LLMs中的迫切需求，需要超越相关性驱动的范式，以构建更可靠、更符合伦理的AI系统。虽然现有的许多综述和研究侧重于利用提示工程来激活LLMs的因果知识，或开发基准来评估其因果推理能力，但这些努力大多依赖于人工干预来激活预训练模型。如何将因果性嵌入到LLMs的训练过程中，并构建更通用、更智能的模型，仍然是一个未被探索的领域。最近的研究强调，LLMs扮演着"因果鹦鹉"的角色，能够复述因果知识，却并未真正理解或应用它。这些基于提示的方法仍然局限于人工干预式的改进。本综述旨在通过探讨因果性如何能在LLMs生命周期的每个阶段——从词元嵌入学习、基础模型训练到微调、对齐、推理和评估——增强其能力，来填补这一空白，从而为开发更具可解释性、更可靠且具备因果认知的模型铺平道路。此外，我们进一步概述了六个有前景的未来研究方向，以推动LLM的发展，增强其因果推理能力，并解决这些模型当前面临的局限性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日