Retrieval-Augmented Generation (RAG) models are designed to incorporate external knowledge, reducing hallucinations caused by insufficient parametric (internal) knowledge. However, even with accurate and relevant retrieved content, RAG models can still produce hallucinations by generating outputs that conflict with the retrieved information. Detecting such hallucinations requires disentangling how Large Language Models (LLMs) utilize external and parametric knowledge. Current detection methods often focus on one of these mechanisms or without decoupling their intertwined effects, making accurate detection difficult. In this paper, we investigate the internal mechanisms behind hallucinations in RAG scenarios. We discover hallucinations occur when the Knowledge FFNs in LLMs overemphasize parametric knowledge in the residual stream, while Copying Heads fail to effectively retain or integrate external knowledge from retrieved content. Based on these findings, we propose ReDeEP, a novel method that detects hallucinations by decoupling LLM's utilization of external context and parametric knowledge. Our experiments show that ReDeEP significantly improves RAG hallucination detection accuracy. Additionally, we introduce AARF, which mitigates hallucinations by modulating the contributions of Knowledge FFNs and Copying Heads.
翻译:检索增强生成(RAG)模型旨在整合外部知识,以减少因参数化(内部)知识不足而产生的幻觉。然而,即使检索内容准确且相关,RAG模型仍可能生成与检索信息相冲突的输出,从而产生幻觉。检测此类幻觉需要厘清大型语言模型(LLMs)如何利用外部知识与参数化知识。现有检测方法通常仅关注其中一种机制,或未能解耦其相互交织的影响,导致准确检测困难。本文研究了RAG场景中幻觉产生的内部机制。我们发现,当LLMs中的知识前馈网络(Knowledge FFNs)在残差流中过度强调参数化知识,而复制头(Copying Heads)未能有效保留或整合检索内容中的外部知识时,幻觉便会产生。基于这些发现,我们提出了ReDeEP——一种通过解耦LLM对外部上下文与参数化知识的利用来检测幻觉的新方法。实验表明,ReDeEP显著提升了RAG幻觉检测的准确率。此外,我们引入了AARF,通过调节知识前馈网络与复制头的贡献来缓解幻觉。