Recent breakthroughs in artificial intelligence have driven a paradigm shift, where large language models (LLMs) with billions or trillions of parameters are trained on vast datasets, achieving unprecedented success across a series of language tasks. However, despite these successes, LLMs still rely on probabilistic modeling, which often captures spurious correlations rooted in linguistic patterns and social stereotypes, rather than the true causal relationships between entities and events. This limitation renders LLMs vulnerable to issues such as demographic biases, social stereotypes, and LLM hallucinations. These challenges highlight the urgent need to integrate causality into LLMs, moving beyond correlation-driven paradigms to build more reliable and ethically aligned AI systems. While many existing surveys and studies focus on utilizing prompt engineering to activate LLMs for causal knowledge or developing benchmarks to assess their causal reasoning abilities, most of these efforts rely on human intervention to activate pre-trained models. How to embed causality into the training process of LLMs and build more general and intelligent models remains unexplored. Recent research highlights that LLMs function as causal parrots, capable of reciting causal knowledge without truly understanding or applying it. These prompt-based methods are still limited to human interventional improvements. This survey aims to address this gap by exploring how causality can enhance LLMs at every stage of their lifecycle-from token embedding learning and foundation model training to fine-tuning, alignment, inference, and evaluation-paving the way for more interpretable, reliable, and causally-informed models. Additionally, we further outline six promising future directions to advance LLM development, enhance their causal reasoning capabilities, and address the current limitations these models face.
翻译:人工智能领域的近期突破推动了范式转变,其中拥有数十亿或数万亿参数的大语言模型(LLMs)在庞大数据集上进行训练,在一系列语言任务中取得了前所未有的成功。然而,尽管取得了这些成就,LLMs仍然依赖于概率建模,这种方法通常捕捉到植根于语言模式和社会刻板印象的虚假相关性,而非实体与事件之间真实的因果关系。这一局限性使得LLMs容易受到诸如人口统计偏见、社会刻板印象以及LLM幻觉等问题的影响。这些挑战凸显了将因果性整合到LLMs中的迫切需求,需要超越相关性驱动的范式,以构建更可靠、更符合伦理的AI系统。虽然现有的许多综述和研究侧重于利用提示工程来激活LLMs的因果知识,或开发基准来评估其因果推理能力,但这些努力大多依赖于人工干预来激活预训练模型。如何将因果性嵌入到LLMs的训练过程中,并构建更通用、更智能的模型,仍然是一个未被探索的领域。最近的研究强调,LLMs扮演着"因果鹦鹉"的角色,能够复述因果知识,却并未真正理解或应用它。这些基于提示的方法仍然局限于人工干预式的改进。本综述旨在通过探讨因果性如何能在LLMs生命周期的每个阶段——从词元嵌入学习、基础模型训练到微调、对齐、推理和评估——增强其能力,来填补这一空白,从而为开发更具可解释性、更可靠且具备因果认知的模型铺平道路。此外,我们进一步概述了六个有前景的未来研究方向,以推动LLM的发展,增强其因果推理能力,并解决这些模型当前面临的局限性。