From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

翻译：检测大型语言模型中的幻觉是一个关键且未解决的问题，对安全性和可靠性具有重大影响。尽管现有幻觉检测方法在问答任务中表现良好，但在需要推理的任务上效果仍不理想。本文通过分布外检测（OOD）这一在计算机视觉等领域已广泛研究的问题视角重新审视幻觉检测。将语言模型中的下一个词预测视为分类任务，使我们能够应用OOD技术，但需根据大型语言模型的结构差异进行适当修改。我们证明，基于OOD的方法可构建无需训练、基于单样本的检测器，在推理任务的幻觉检测中实现高精度。总体而言，我们的研究表明，将幻觉检测重新定义为OOD检测为语言模型安全性提供了一条有前景且可扩展的路径。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

扭曲还是编造？视频大语言模型幻觉研究综述

专知会员服务

14+阅读 · 4月15日

大语言模型与视觉模型中的幻觉现象理解综述

专知会员服务

21+阅读 · 2025年10月2日

深度图学习在分布偏移下的综述：从图的分布外泛化到自适应

专知会员服务

18+阅读 · 2024年10月28日

分布外OOD检测的最新进展：问题与方法

专知会员服务

22+阅读 · 2024年9月23日