Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that reliability follows from "structural" visual perception: tight attention on relevant regions should signal a trustworthy answer, while scattered attention signals confusion. We challenge this through the VLM Reliability Probe (VRP), a systematic cross-family study of reliability signals in contemporary Vision-Language Models (VLMs). We introduce structural-attention metrics, cluster counts (C_k) and spatial entropy (H_s), to quantify the visual encoder's gaze, and track its evolution (Delta H_s) across layers. This reveals a "Symbolic Detachment": models often "Early Lock" visual features only to diffuse attention later, severing early perception from final generation. Contrary to the grounding hypothesis, we find a "Cluster Failure": spatial attention has near-zero correlation (R approx 0.001) with accuracy. Instead, reliability is a phenomenon of generation dynamics and internal-state distributions. Self-Consistency, the agreement rate across sampled reasoning paths, is the dominant predictor of truth (R = 0.429). Scaling causal interventions exposes a sharp architectural divergence: LLaVA locks its prediction in a fragile late-stage bottleneck, whereas PaliGemma and Qwen2-VL distribute reliability globally, staying resilient even when ~50% or more of their most predictive layer is destroyed. For current VLMs, reliability signals are detached from visual grounding maps and are best inferred from generation-time dynamics and hidden-state probes.
翻译:多模态基础模型越来越多地被用作推理代理,因此其可靠性——即模型何时可能产生幻觉——变得至关重要。一种常见的直觉,我们称之为“注意力-置信度假设”,认为可靠性源于“结构性”视觉感知:对相关区域的紧密注意力应预示着可信的答案,而分散的注意力则表明混乱。我们通过VLM可靠性探针(VRP)挑战这一观点,这是一项针对当代视觉-语言模型(VLM)中可靠性信号的系统性跨族研究。我们引入结构注意力度量——聚类数量(C_k)和空间熵(H_s)——来量化视觉编码器的注视行为,并追踪其跨层演化(ΔH_s)。这揭示了一种“符号性脱离”:模型常常“早期锁定”视觉特征,但在后续层扩散注意力,从而割裂早期感知与最终生成。与接地假设相反,我们发现一个“聚类失效”:空间注意力与准确性之间的相关性几乎为零(R ≈ 0.001)。相反,可靠性是生成动态和内部状态分布的现象。自我一致性,即采样推理路径间的一致率,是真实性的主导预测因子(R = 0.429)。扩展因果干预暴露了显著的结构性分歧:LLaVA将其预测锁定在脆弱的后期瓶颈中,而PaliGemma和Qwen2-VL则将可靠性全局分布,即使其最具预测性的层被破坏约50%或更多,仍保持鲁棒性。对于当前的VLM,可靠性信号与视觉接地图相脱离,最好通过生成时动态和隐藏状态探针来推断。