Delusions of Large Language Models

Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to model reliability. Through empirical analysis across different model families and sizes on several Question Answering tasks, we show that delusions are prevalent and distinct from hallucinations. LLMs exhibit lower honesty with delusions, which are harder to override via finetuning or self reflection. We link delusion formation with training dynamics and dataset noise and explore mitigation strategies such as retrieval augmented generation and multi agent debating to mitigate delusions. By systematically investigating the nature, prevalence, and mitigation of LLM delusions, our study provides insights into the underlying causes of this phenomenon and outlines future directions for improving model reliability.

翻译：大语言模型经常生成事实错误但看似合理的输出，即所谓的幻觉。我们识别出一种更为隐蔽的现象——LLM妄想症，其定义为高置信度幻觉，即模型以异常高的置信度生成错误输出，使得这些错误更难被检测和纠正。与普通幻觉不同，妄想症伴随着较低的不确定性持续存在，对模型的可靠性构成了重大挑战。通过对不同模型系列和规模在多项问答任务上的实证分析，我们表明妄想症普遍存在且与幻觉有本质区别。大语言模型在面对妄想症时表现出较低的真实性，且妄想症难以通过微调或自我反思来纠正。我们将妄想症的形成与训练动态和数据集噪声联系起来，并探索了诸如检索增强生成和多智能体辩论等缓解策略来减轻妄想症。通过系统研究LLM妄想症的本质、普遍性和缓解方法，我们的研究为理解这一现象的根本原因提供了见解，并为提升模型可靠性指明了未来方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日