Medical Large Vision-Language Models (Med-LVLMs) have shown strong potential in multimodal diagnostic tasks. However, existing single-agent models struggle to generalize across diverse medical specialties, limiting their performance. Recent efforts introduce multi-agent collaboration frameworks inspired by clinical workflows, where general practitioners (GPs) and specialists interact in a fixed sequence. Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists and its own knowledge to make final decisions. To address the inconsistency in specialist outputs, we introduce a curriculum learning (CL)-guided RL strategy with dynamic entropy regulation, progressively teaching the attending physician to balance between imitating specialists and correcting their mistakes. Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL outperforms both open-source and proprietary Med-LVLMs. Notably, it achieves an average performance gain of 23.6% over strong baselines.
翻译:医学大型视觉语言模型(Med-LVLMs)在多模态诊断任务中展现出巨大潜力。然而,现有的单智能体模型难以泛化到不同的医学专科领域,限制了其性能。近期研究受临床工作流程启发,引入了多智能体协作框架,其中全科医生与专科医生按固定顺序进行交互。尽管有所改进,这些静态流程在推理中缺乏灵活性和适应性。为解决这一问题,我们提出了MMedAgent-RL,一种基于强化学习(RL)的多智能体框架,可实现医学智能体间的动态优化协作。具体而言,我们基于Qwen2.5-VL通过强化学习训练了两个全科医生智能体:分诊医生学习将患者分派至合适的专科,而主治医生则整合多位专科医生的判断及其自身知识以做出最终决策。为解决专科医生输出的不一致性问题,我们引入了一种课程学习(CL)引导的强化学习策略,并采用动态熵调节,逐步指导主治医生在模仿专科医生与纠正其错误之间取得平衡。在五个医学视觉问答基准测试上的实验表明,MMedAgent-RL的性能优于开源及专有的Med-LVLMs。值得注意的是,其平均性能相较于强基线模型提升了23.6%。