Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to model reliability. Through empirical analysis across different model families and sizes on several Question Answering tasks, we show that delusions are prevalent and distinct from hallucinations. LLMs exhibit lower honesty with delusions, which are harder to override via finetuning or self reflection. We link delusion formation with training dynamics and dataset noise and explore mitigation strategies such as retrieval augmented generation and multi agent debating to mitigate delusions. By systematically investigating the nature, prevalence, and mitigation of LLM delusions, our study provides insights into the underlying causes of this phenomenon and outlines future directions for improving model reliability.
翻译:大语言模型经常生成事实错误但看似合理的输出,即所谓的幻觉。我们识别出一种更为隐蔽的现象——LLM妄想症,其定义为高置信度幻觉,即模型以异常高的置信度生成错误输出,使得这些错误更难被检测和纠正。与普通幻觉不同,妄想症伴随着较低的不确定性持续存在,对模型的可靠性构成了重大挑战。通过对不同模型系列和规模在多项问答任务上的实证分析,我们表明妄想症普遍存在且与幻觉有本质区别。大语言模型在面对妄想症时表现出较低的真实性,且妄想症难以通过微调或自我反思来纠正。我们将妄想症的形成与训练动态和数据集噪声联系起来,并探索了诸如检索增强生成和多智能体辩论等缓解策略来减轻妄想症。通过系统研究LLM妄想症的本质、普遍性和缓解方法,我们的研究为理解这一现象的根本原因提供了见解,并为提升模型可靠性指明了未来方向。