While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/KCA}.
翻译:大型语言模型(LLMs)经过人类对齐后在各类任务中展现出卓越性能,但仍可能生成看似合理却与事实知识相矛盾的响应,这种现象被称为“幻觉”。本文证明了通过验证并最小化对齐数据中的外部知识与基础LLMs内嵌知识之间的不一致性来缓解幻觉的可行性。具体而言,我们提出了一种名为知识一致性对齐(Knowledge Consistent Alignment, KCA)的新方法,该方法利用经过良好对齐的LLM自动基于外部知识制定评估方案,以检验基础LLMs的知识边界。针对对齐数据中的知识不一致问题,KCA实施了多种特定策略来处理这些数据实例。我们在六个基准测试中,使用不同架构和规模的基础LLMs,证明了KCA在减少幻觉方面的卓越效果。这证实了通过降低知识不一致性来缓解幻觉的有效性。我们的代码、模型权重和数据已公开于\url{https://github.com/fanqiwan/KCA}。