Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.
翻译:大语言模型(LLMs)经常生成看似合理但非事实的内容,这一现象被称为幻觉。现有检测方法通常依赖计算成本高昂的基于采样的一致性检查或外部知识检索,我们提出一种新方法,将LLM视为黑盒动力系统。通过嵌入模型将LLM响应投影到高维流形上,我们将生成的向量序列表征为模型隐状态空间动力学的可观实现。利用库普曼算子理论,我们拟合事实与幻觉状态下的转移算子,并基于各自的预测误差定义微分残差分数。为适应不同用户需求和领域特异性敏感度,我们引入偏好感知校准机制,该机制基于少量演示样例优化分类阈值。这种方法能够在单次样本传递中实现低成本的幻觉检测,无需二次采样或外部知识库支撑。在三个数据基准上的广泛测试表明,我们的方法以更低的资源开销达到了最优性能。