Large language models (LLMs) frequently generate confident yet inaccurate responses, introducing significant risks for deployment in safety-critical domains. We present a novel approach to detecting model hallucination through systematic analysis of information flow across model layers when processing inputs with insufficient or ambiguous context. Our investigation reveals that hallucination manifests as usable information deficiencies in inter-layer transmissions. While existing approaches primarily focus on final-layer output analysis, we demonstrate that tracking cross-layer information dynamics ($\mathcal{L}$I) provides robust indicators of model reliability, accounting for both information gain and loss during computation. $\mathcal{L}$I improves model reliability by immediately integrating with universal LLMs without additional training or architectural modifications.
翻译:大型语言模型(LLM)经常生成看似自信但不准确的回答,这为其在安全关键领域的部署带来了重大风险。本文提出了一种新颖的方法,通过系统分析模型在处理信息不足或上下文模糊的输入时各层间的信息流动来检测模型幻觉。研究发现,幻觉表现为层间信息传递中可用信息的缺失。现有方法主要关注最终输出层的分析,而本文证明追踪跨层信息动态($\mathcal{L}$I)能为模型可靠性提供稳健的指标,该指标综合考虑了计算过程中的信息增益与损失。$\mathcal{L}$I方法无需额外训练或架构修改即可直接集成到通用LLM中,从而提升模型可靠性。