Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.
翻译:尽管大型语言模型(LLMs)具备令人瞩目的能力,但仍易产生幻觉,即生成与预训练中观察到的事实不符的内容。我们提出一种简单的解码策略,用于减少预训练LLMs的幻觉,该策略无需依赖检索外部知识或额外微调。我们的方法通过对比后层投影与前层投影到词汇空间时logits的差异,来获取下一个词的概率分布,利用LLMs中事实知识通常集中在特定Transformer层这一发现。实验表明,这种对比层解码(DoLa)方法能更好地突显事实知识,并减少错误事实的生成。DoLa在多项选择任务和开放式生成任务中一致地提升了事实准确性,例如在TruthfulQA上使LLaMA系列模型的性能提升了12-17个绝对百分点,展示了其在使LLMs可靠生成真实事实方面的潜力。