Vision-language models (VLMs) now rival human performance on many multimodal tasks, yet they still hallucinate objects or generate unsafe text. Current hallucination detectors, e.g., single-token linear probing (LP) and PTrue, typically analyze only the logit of the first generated token or just its highest-scoring component, overlooking richer signals embedded within earlier token distributions. We demonstrate that analyzing the complete sequence of early logits potentially provides substantially more diagnostic information. We emphasize that hallucinations may only emerge after several tokens, as subtle inconsistencies accumulate over time. By analyzing the Kullback-Leibler (KL) divergence between logits corresponding to hallucinated and non-hallucinated tokens, we underscore the importance of incorporating later-token logits to more accurately capture the reliability dynamics of VLMs. In response, we introduce Multi-Token Reliability Estimation (MTRE), a lightweight, white-box method that aggregates logits from the first ten tokens using multi-token log-likelihood ratios and self-attention. Despite the challenges posed by large vocabulary sizes and long logit sequences, MTRE remains efficient and tractable. Across MAD-Bench, MM-SafetyBench, MathVista, and four compositional-geometry benchmarks, MTRE achieves a 9.4% gain in accuracy and a 14.8% gain in AUROC over standard detection methods, establishing a new state of the art in hallucination detection for open-source VLMs.
翻译:视觉语言模型(VLMs)目前在众多多模态任务上已达到与人类相当的性能,但其仍会产生对象幻觉或生成不安全文本。现有的幻觉检测方法,例如单令牌线性探测(LP)和PTrue,通常仅分析第一个生成令牌的对数概率或其最高分成分,忽略了早期令牌分布中蕴含的更丰富信号。我们证明,分析早期对数概率的完整序列可能提供显著更多的诊断信息。我们强调,幻觉可能仅在数个令牌之后才显现,因为细微的不一致性会随时间累积。通过分析幻觉令牌与非幻觉令牌对应对数概率之间的Kullback-Leibler(KL)散度,我们强调了纳入后续令牌对数概率对于更准确捕捉VLMs可靠性动态的重要性。为此,我们提出了多令牌可靠性估计(MTRE),这是一种轻量级白盒方法,它利用多令牌对数似然比和自注意力机制聚合前十个令牌的对数概率。尽管面临大词汇量和长对数概率序列带来的挑战,MTRE仍保持高效且易于处理。在MAD-Bench、MM-SafetyBench、MathVista以及四个组合几何基准测试中,MTRE相比标准检测方法在准确率上提升了9.4%,在AUROC上提升了14.8%,为开源VLMs的幻觉检测确立了新的技术水平。