Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that when models generate multiple responses to the same prompt, genuine ones exhibit tighter clustering in the embedding space, we prove this hypothesis and, leveraging this geometrical insight, we also show that it is possible to achieve a consistent level of separability. This latter result is used to introduce a label-efficient propagation method that classifies large collections of responses from just 30-50 annotations, achieving F1 scores above 90%. Our findings, framing hallucinations from a geometric perspective in the embedding space, complement traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.
翻译:幻觉——即流畅但事实错误的回答——对语言模型的可靠性构成重大挑战,在多步骤或智能体场景中尤为突出。本研究从几何视角探究小规模LLM中的幻觉现象,基于如下假设展开:当模型对同一提示生成多个回答时,真实回答在嵌入空间中呈现更紧密的聚类特征。我们证实了这一假设,并利用该几何洞察进一步证明可以实现稳定的可分离性。后一结论被用于提出一种标签高效的传播方法,该方法仅需30-50个标注即可对大规模回答集合进行分类,F1分数超过90%。我们的研究从嵌入空间几何视角构建了幻觉分析框架,补充了传统的以知识为中心的单回答评估范式,为后续研究开辟了新路径。