Hallucinations -- plausible but factually incorrect responses -- pose a major challenge to the reliability of Large Language Models (LLMs), especially in multi-step or agentic settings. Existing work largely frames hallucinations as a consequence of missing knowledge; we show instead that, even when the relevant factual knowledge is present, models still produce hallucinated answers, pointing to retrieval instability rather than knowledge gaps. Building on this observation, we introduce APORIA (Aggregate Prompt-wise Observation Retrieving Instability via Asymmetry -- the state of puzzlement-in-contradiction that hallucinations embody), a geometric framework that studies repeated responses to the same prompt in sentence-embedding space. Our central hypothesis is that genuine responses cluster more tightly than hallucinated ones; we empirically validate this and show that, after Fisher projection, the two response classes become consistently separable. We leverage this asymmetry in geometry via APORIA-LP, an efficient label-propagation method that classifies large collections of responses from as few as 30--50 annotations, achieving F1 scores above 90% across ten small-sized LLMs. To support further research, we release SOCRATES-300K, a fully labelled dataset of 300,000 responses, together with the code for both dataset generation and result reproduction. Our key finding -- framing hallucinations from a geometric perspective in the embedding space -- complements traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.
翻译:幻觉——看似合理但事实错误的回答——对大型语言模型(LLM)的可靠性构成重大挑战,尤其是在多步骤或智能体场景中。现有研究大多将幻觉归因于知识缺失;我们则证明,即使相关事实知识存在,模型仍会产生幻觉回答,这表明问题源于检索不稳定性而非知识缺口。基于这一发现,我们提出APORIA(通过不对称性聚合提示级观测检索不稳定性——幻觉所体现的矛盾困惑状态),这是一个在句子嵌入空间中研究同一提示重复响应的几何框架。我们的核心假设是:真实回答的聚类比幻觉回答更紧密;我们通过实验验证了这一假设,并证明经过Fisher投影后,两类回答具有一致的可分性。我们利用这种几何不对称性设计了APORIA-LP——一种高效的标签传播方法,仅需30-50条标注即可对大规模响应集合进行分类,在十个小型LLM上实现了超过90%的F1分数。为支持进一步研究,我们发布了包含30万条完整标注响应的SOCRATES-300K数据集,以及数据集生成和结果复现的代码。我们的核心发现——从嵌入空间的几何视角审视幻觉——补充了传统的知识中心型和单响应评估范式,为后续研究铺平了道路。