Banishing LLM Hallucinations Requires Rethinking Generalization

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.

翻译：尽管大型语言模型（LLM）在对话、编程和推理方面展现出强大能力，但其频繁产生幻觉的问题依然存在。传统观点认为，幻觉是创造力与事实性之间平衡的结果，通过将LLM与外部知识源进行锚定可以缓解但无法根除该现象。我们通过大量系统性实验表明，这些传统方法无法解释LLM在实际应用中出现幻觉的根本原因。具体而言，我们证明了采用大规模混合记忆专家（MoME）增强的LLM能够轻松记忆海量随机数数据集。我们通过理论构建进一步验证了实验发现：当训练损失超过特定阈值时（这在互联网规模数据训练中普遍存在），即使是仅预测下一标记的简单神经网络也会产生幻觉。我们将这些发现与缓解幻觉的传统检索方法进行对比分析，并基于此设计了首代消除幻觉的模型——Lamini-1。该模型通过动态检索机制，将事实存储于由数百万记忆专家构成的混合记忆系统中。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日