Existing large language models (LLMs) are known for generating "hallucinated" content, namely a fabricated text of plausibly looking, yet unfounded, facts. To identify when these hallucination scenarios occur, we examine the properties of the generated text in the embedding space. Specifically, we draw inspiration from the dynamic mode decomposition (DMD) tool in analyzing the pattern evolution of text embeddings across sentences. We empirically demonstrate how the spectrum of sentence embeddings over paragraphs is constantly low-rank for the generated text, unlike that of the ground-truth text. Importantly, we find that evaluation cases having LLM hallucinations correspond to ground-truth embedding patterns with a higher number of modes being poorly approximated by the few modes associated with LLM embedding patterns. In analogy to near-field electromagnetic evanescent waves, the embedding DMD eigenmodes of the generated text with hallucinations vanishes quickly across sentences as opposed to those of the ground-truth text. This suggests that the hallucinations result from both the generation techniques and the underlying representation.
翻译:摘要:现有大语言模型(LLMs)以生成“幻觉”内容而闻名,即看似合理但缺乏事实依据的虚构文本。为识别这些幻觉场景的发生,我们研究了生成文本在嵌入空间中的特性。具体而言,我们借鉴动态模态分解(DMD)工具分析句子间文本嵌入的模式演化。实验表明:与真实文本不同,生成文本的段落句子嵌入谱始终呈现低秩特性。重要的是,我们发现大语言模型产生幻觉的评估案例对应于真实嵌入模式具有更多模态,而这些模态难以被与LLM嵌入模式相关的少数模态有效逼近。类比于近场电磁倏逝波,生成幻觉文本的嵌入DMD本征模态会在句子间迅速衰减,而真实文本则不然。这表明幻觉现象既源于生成技术也源于底层表征机制本身。