Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that when models generate multiple responses to the same prompt, genuine ones exhibit tighter clustering in the embedding space, we prove this hypothesis and, leveraging this geometrical insight, we also show that it is possible to achieve a consistent level of separability. This latter result is used to introduce a label-efficient propagation method that classifies large collections of responses from just 30-50 annotations, achieving F1 scores above 90%. Our findings, framing hallucinations from a geometric perspective in the embedding space, complement traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.
翻译:幻觉——即流畅但事实错误的回答——对语言模型的可靠性构成了重大挑战,在多步骤或代理式场景中尤为突出。本研究从几何视角探究小规模大语言模型中的幻觉现象,基于以下假设展开:当模型对同一提示生成多个回答时,真实回答在嵌入空间中会呈现更紧密的聚类。我们证明了该假设,并利用这一几何洞见,进一步证明了可实现一致水平的可分离性。后一结果被用于提出一种标签高效的传播方法,该方法仅需30-50个标注即可对大量回答进行分类,F1分数超过90%。我们的研究从嵌入空间的几何角度构建了幻觉分析框架,补充了传统的以知识为中心和单回答评估范式,为后续研究开辟了新路径。