Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious correlations and limiting their clinical reliability. We pinpoint three core challenges in medical CoT reasoning: how to adaptively trigger causal correction, construct high-quality causal-spurious contrastive samples, and maintain causal consistency across reasoning trajectories. To address these challenges, we propose MedCausalX, an end-to-end framework explicitly models causal reasoning chains in medical VLMs. We first introduce the CRMed dataset providing fine-grained anatomical annotations, structured causal reasoning chains, and counterfactual variants that guide the learning of causal relationships beyond superficial correlations. Building upon CRMed, MedCausalX employs a two-stage adaptive reflection architecture equipped with $\langle$causal$\rangle$ and $\langle$verify$\rangle$ tokens, enabling the model to autonomously determine when and how to perform causal analysis and verification. Finally, a trajectory-level causal correction objective optimized through error-attributed reinforcement learning refines the reasoning chain, allowing the model to distinguish genuine causal dependencies from shortcut associations. Extensive experiments on multiple benchmarks show that MedCausalX consistently outperforms state-of-the-art methods, improving diagnostic consistency by +5.4 points, reducing hallucination by over 10 points, and attaining top spatial grounding IoU, thereby setting a new standard for causally grounded medical reasoning.
翻译:视觉语言模型通过整合视觉感知与语言推理,实现了可解释的医学诊断。然而,现有医学思维链模型缺乏显式的因果推理表示与强制执行机制,使其易受虚假相关性影响,并限制了临床可靠性。我们指出了医学思维链推理中的三个核心挑战:如何自适应触发因果修正、构建高质量因果-虚假对比样本、以及维持推理轨迹间的因果一致性。为应对这些挑战,我们提出MedCausalX——一种在医学视觉语言模型中显式建模因果推理链的端到端框架。我们首先引入CRMed数据集,该数据集提供精细的解剖标注、结构化因果推理链及反事实变体,引导模型学习超越表层相关性的因果关系。基于CRMed,MedCausalX采用配备⟨因果⟩和⟨验证⟩令牌的两阶段自适应反思架构,使模型能自主决定何时及如何执行因果分析与验证。最后,通过错误归因强化学习优化的轨迹级因果修正目标,精炼推理链,使模型能够区分真实因果依赖与捷径关联。在多个基准上的大量实验表明,MedCausalX持续优于现有方法:诊断一致性提升+5.4个百分点,幻觉现象减少超过10个百分点,并取得最高空间定位IoU,从而为因果驱动的医学推理设立了新标准。