Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction

Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.

翻译：大语言模型（LLMs）经常生成看似合理但非事实的内容，这一现象被称为幻觉。现有检测方法通常依赖计算成本高昂的基于采样的一致性检查或外部知识检索，我们提出一种新方法，将LLM视为黑盒动力系统。通过嵌入模型将LLM响应投影到高维流形上，我们将生成的向量序列表征为模型隐状态空间动力学的可观实现。利用库普曼算子理论，我们拟合事实与幻觉状态下的转移算子，并基于各自的预测误差定义微分残差分数。为适应不同用户需求和领域特异性敏感度，我们引入偏好感知校准机制，该机制基于少量演示样例优化分类阈值。这种方法能够在单次样本传递中实现低成本的幻觉检测，无需二次采样或外部知识库支撑。在三个数据基准上的广泛测试表明，我们的方法以更低的资源开销达到了最优性能。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

24+阅读 · 2025年10月29日

LLMs与生成式智能体模拟：复杂系统研究的新范式

专知会员服务

28+阅读 · 2025年6月15日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

【NeurIPS 2024】HaloScope：利用未标记的大型语言模型生成进行幻觉检测

专知会员服务

20+阅读 · 2024年9月27日