Large Vision Language Models (LVLMs) show promise in medical applications, but their inability to faithfully ground responses in visual evidence raises serious concerns about clinical trustworthiness. While visual attribution methods are widely used to explain LVLM predictions, whether these explanations actually reflect the visual evidence underlying the model's decision is largely unverified, since ground-truth annotations for internal model reasoning are typically unavailable. We address this question for chest X-ray (CXR) reasoning by developing a causal evaluation framework that retains only CXR-VQA samples for which the expert-annotated region is verified, via counterfactual editing, to be causally responsible for the model's prediction. Using this framework across 11 attribution methods, six open-source LVLMs, and two output modes (direct answer and step-by-step reasoning), we find that existing attribution methods often fail to identify the evidence used by LVLMs. To address this failure, we propose MedFocus, a concept-based attribution method that localizes clinically meaningful anatomical regions via unbalanced optimal transport and measures their causal effect on model outputs through targeted interventions. MedFocus produces spatial, concept-level, and token-level attributions and substantially outperforms prior methods, taking a step toward more trustworthy attribution for medical LVLMs. Our data and code are available at https://github.com/gzxiong/medfocus/.
翻译:大型视觉语言模型(LVLMs)在医学应用中展现出潜力,但其无法将响应忠实锚定于视觉证据的问题引发了对临床可信度的严重担忧。尽管视觉归因方法被广泛用于解释LVLM预测,但这些解释是否真正反映模型决策背后的视觉证据在很大程度上未经验证——因为内部模型推理的标注真值通常不可得。我们通过构建因果评估框架来解决胸部X光(CXR)推理中的这一问题:该框架仅保留那些经反事实编辑验证其专家标注区域对模型预测具有因果责任的CXR视觉问答(VQA)样本。基于该框架,我们评估了11种归因方法、6个开源LVLM及两种输出模式(直接答案与逐步推理),发现现有归因方法往往无法识别LVLM实际使用的证据。为解决这一缺陷,我们提出MedFocus——一种基于概念的归因方法,通过非平衡最优传输定位临床意义的解剖区域,并借助定向干预测量这些区域对模型输出的因果效应。MedFocus可生成空间级、概念级及词元级归因,并显著优于现有方法,为构建更可信的医学LVLM归因迈出关键一步。我们的数据和代码开源于https://github.com/gzxiong/medfocus/。