Hallucinations in Large Vision-Language Models (LVLMs) pose significant security and reliability risks in real-world applications. Inspired by the observation that humans are more error-prone when uncertain or hesitant, we investigate how instability in a model 's internal knowledge contributes to LVLM hallucinations. We conduct extensive empirical analyses from three perspectives, namely attention heads, model layers, and decoding tokens, and identify three key hallucination patterns: (i) visual activation drift across attention heads, (ii) pronounced knowledge fluctuations across layers, and (iii) visual focus distraction between neighboring output tokens. Building on these findings, we propose Stability-Aware Knowledge-Enhanced Decoding (SAKED), which introduces a layer-wise Knowledge Stability Score (KSS) to quantify knowledge stability throughout the model. By contrasting the most stability-aware and stability-agnostic layers, SAKED suppresses decoding noise and dynamically leverages the most reliable internal knowledge for faithful token generation. Moreover, SAKED is training-free and can be seamlessly integrated into different architectures. Extensive experiments demonstrate that SAKED achieves state-of-the-art performance for hallucination mitigation on various models, tasks, and benchmarks.
翻译:大型视觉语言模型(LVLM)中的幻觉问题在实际应用中构成了显著的安全性与可靠性风险。受人类在不确定或犹豫时更易出错的观察启发,我们研究了模型内部知识的不稳定性如何导致LVLM产生幻觉。我们从注意力头、模型层和解码标记三个角度进行了广泛的实证分析,识别出三种关键的幻觉模式:(i) 注意力头间的视觉激活漂移,(ii) 跨层的显著知识波动,以及(iii) 相邻输出标记间的视觉焦点分散。基于这些发现,我们提出了稳定性感知知识增强解码(SAKED),该方法引入了逐层知识稳定性分数(KSS)以量化模型整体的知识稳定性。通过对比最具稳定性感知与最不具稳定性感知的层,SAKED抑制了解码噪声,并动态利用最可靠的内部知识以生成忠实的标记。此外,SAKED无需训练,可无缝集成到不同的模型架构中。大量实验表明,SAKED在各种模型、任务和基准测试中,均实现了最先进的幻觉缓解性能。