Hallucination is a central failure mode in large language models (LLMs). We focus on hallucinations of answers to questions like: "Which instrument did Glenn Gould play?", but we ask these questions for synthetic entities that are unknown to the model. Surprisingly, we find that medium-size models like Gemma-7B-IT frequently hallucinate, i.e., they have difficulty recognizing that the hallucinated fact is not part of their knowledge. We hypothesize that an important factor in causing these hallucinations is the linearity of the relation: linear relations tend to be stored more abstractly, making it difficult for the LLM to assess its knowledge; the facts of nonlinear relations tend to be stored more directly, making knowledge assessment easier. To investigate this hypothesis, we create SyntHal, a dataset of 6000 synthetic entities for six relations. In our experiments with four models, we determine, for each relation, the hallucination rate on SyntHal and also measure its linearity, using $Δ\cos$. We find a strong correlation ($r \in [.78,.82]$) between relational linearity and hallucination rate, providing evidence for our hypothesis that the underlying storage of triples of a relation is a factor in how well a model can self-assess its knowledge. This finding has implications for how to manage hallucination behavior and suggests new research directions for improving the representation of factual knowledge in LLMs.
翻译:幻觉是大语言模型(LLM)的一种核心失效模式。我们关注对诸如“格伦·古尔德演奏什么乐器?”这类问题的回答产生的幻觉,但我们针对模型未知的合成实体提出这些问题。令人惊讶的是,我们发现中等规模的模型(如 Gemma-7B-IT)经常产生幻觉,即它们难以认识到幻觉事实并非其知识的一部分。我们假设导致这些幻觉的一个重要因素是关系的线性:线性关系倾向于以更抽象的方式存储,使得LLM难以评估其知识;而非线性关系的事实则倾向于更直接地存储,使得知识评估更为容易。为验证这一假设,我们创建了SyntHal数据集,包含六个关系的6000个合成实体。在我们对四个模型的实验中,我们为每个关系确定了其在SyntHal上的幻觉率,并使用 $Δ\cos$ 测量了其线性度。我们发现关系线性度与幻觉率之间存在强相关性($r \in [.78,.82]$),这为我们的假设提供了证据,即关系三元组的底层存储方式是模型能否有效自我评估其知识的一个因素。这一发现对如何管理幻觉行为具有启示意义,并为改进LLM中事实知识的表示提出了新的研究方向。