Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis

Echocardiographic diagnosis is vital for cardiac screening yet remains challenging. Existing echocardiography foundation models do not effectively capture the relationships between quantitative measurements and clinical manifestations, whereas medical reasoning multimodal large language models (MLLMs) require costly construction of detailed reasoning paths and remain ineffective at directly incorporating such echocardiographic priors into their reasoning. To address these limitations, we propose a novel approach comprising Cardiac Reasoning Template (CRT) and CardiacMind to enhance MLLM's echocardiographic reasoning by introducing cardiologist-like mindset. Specifically, CRT provides stepwise canonical diagnostic procedures for complex cardiac diseases to streamline reasoning path construction without the need for costly case-by-case verification. To incentivize reasoning MLLM under CRT, we develop CardiacMind, a new reinforcement learning scheme with three novel rewards: Procedural Quantity Reward (PQtR), Procedural Quality Reward (PQlR), and Echocardiographic Semantic Reward (ESR). PQtR promotes detailed reasoning; PQlR promotes integration of evidence across views and modalities, while ESR grounds stepwise descriptions in visual content. Our methods show a 48% improvement in multiview echocardiographic diagnosis for 15 complex cardiac diseases and a 5% improvement on CardiacNet-PAH over prior methods. The user study on our method's reasoning outputs shows 93.33% clinician agreement with cardiologist-like reasoning logic. Our code will be available.

翻译：超声心动图诊断对心脏筛查至关重要，但依然具有挑战性。现有的超声心动图基础模型未能有效捕捉定量测量与临床表现之间的关系，而医学推理多模态大语言模型（MLLMs）需要耗费高昂成本构建详细的推理路径，且仍难以将此类超声心动图先验知识直接融入其推理过程。为应对这些局限，我们提出一种包含心脏推理模板（CRT）与CardiacMind的新方法，通过引入类心脏病专家思维模式来增强MLLM的超声心动图推理能力。具体而言，CRT为复杂心脏疾病提供分步规范诊断流程，从而简化推理路径构建，无需进行昂贵的逐例验证。为激励MLLM在CRT框架下进行推理，我们开发了CardiacMind——一种新型强化学习方案，包含三个创新奖励机制：流程数量奖励（PQtR）、流程质量奖励（PQlR）和超声心动图语义奖励（ESR）。PQtR促进详细推理；PQlR推动跨视图与多模态证据的整合；而ESR则将分步描述锚定于视觉内容。我们的方法在15种复杂心脏疾病的多视图超声心动图诊断中实现了48%的性能提升，在CardiacNet-PAH数据集上较现有方法提升5%。针对本方法推理输出的用户研究表明，其推理逻辑获得93.33%临床医生的认可，符合类心脏病专家推理范式。代码将公开提供。