Medical Vision-Language Models (MedVLMs) excel at perception tasks but struggle with complex clinical reasoning required in real-world scenarios. While reinforcement learning (RL) has been explored to enhance reasoning capabilities, existing approaches face critical mismatches: the scarcity of deep reasoning data, cold-start limits multi-specialty alignment, and standard RL algorithms fail to model clinical reasoning diversity. We propose MMedExpert-R1, a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and clinical guideline reinforcement. We construct MMedExpert, a high-quality dataset of 10K samples across four specialties with step-by-step reasoning traces. Our Domain-Specific Adaptation (DSA) creates specialty-specific LoRA modules to provide diverse initialization, while Guideline-Based Advantages (GBA) explicitly models different clinical reasoning perspectives to align with real-world diagnostic strategies. Conflict-Aware Capability Integration then merges these specialized experts into a unified agent, ensuring robust multi-specialty alignment. Comprehensive experiments demonstrate state-of-the-art performance, with our 7B model achieving 27.50 on MedXpert-MM and 83.03 on OmniMedVQA, establishing a robust foundation for reliable multimodal medical reasoning systems.
翻译:医学视觉语言模型(MedVLMs)在感知任务上表现出色,但在现实场景所需的复杂临床推理方面仍存在困难。尽管已有研究探索使用强化学习(RL)来增强推理能力,但现有方法面临几个关键不匹配问题:深度推理数据的稀缺性、冷启动限制了多专科对齐能力,以及标准RL算法未能有效建模临床推理的多样性。我们提出了MMedExpert-R1,一种新颖的推理型MedVLM,通过领域自适应和临床指南强化来解决这些挑战。我们构建了MMedExpert数据集,这是一个包含四个专科共1万个样本的高质量数据集,附带逐步推理轨迹。我们的领域自适应(DSA)方法创建了专科特定的LoRA模块以提供多样化的初始化,而基于指南的优势函数(GBA)则显式建模不同的临床推理视角,以符合真实世界的诊断策略。随后,通过冲突感知能力集成将这些专科专家模型融合为一个统一智能体,确保稳健的多专科对齐能力。综合实验表明我们的方法实现了最先进的性能:7B参数模型在MedXpert-MM上达到27.50分,在OmniMedVQA上达到83.03分,为可靠的多模态医学推理系统奠定了坚实基础。