Large Language Models (LLMs) have shown impressive capabilities but still suffer from the issue of hallucinations. A significant type of this issue is the false premise hallucination, which we define as the phenomenon when LLMs generate hallucinated text when confronted with false premise questions. In this paper, we perform a comprehensive analysis of the false premise hallucination and elucidate its internal working mechanism: a small subset of attention heads (which we designate as false premise heads) disturb the knowledge extraction process, leading to the occurrence of false premise hallucination. Based on our analysis, we propose \textbf{FAITH} (\textbf{F}alse premise \textbf{A}ttention head constra\textbf{I}ining for mi\textbf{T}igating \textbf{H}allucinations), a novel and effective method to mitigate false premise hallucinations. It constrains the false premise attention heads during the model inference process. Impressively, extensive experiments demonstrate that constraining only approximately $1\%$ of the attention heads in the model yields a notable increase of nearly $20\%$ of model performance.
翻译:大型语言模型(LLMs)展现出令人印象深刻的能力,但仍存在幻觉问题。其中一类重要问题是虚假前提幻觉,我们将其定义为当LLMs面对虚假前提问题时生成幻觉文本的现象。本文对虚假前提幻觉进行了全面分析,并阐明了其内部工作机制:一小部分注意力头(我们将其命名为虚假前提头)干扰了知识提取过程,导致虚假前提幻觉的产生。基于我们的分析,我们提出了\textbf{FAITH}(\textbf{F}alse premise \textbf{A}ttention head constra\textbf{I}ining for mi\textbf{T}igating \textbf{H}allucinations,即虚假前提注意力头约束缓解幻觉),这是一种新颖且有效的缓解虚假前提幻觉的方法。该方法在模型推理过程中对虚假前提注意力头进行约束。令人印象深刻的是,大量实验表明,仅约束模型中约$1\%$的注意力头即可使模型性能显著提升近$20\%$。