Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code and data are publicly available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}, and an online platform can be accessed at \href{http://ai.heartvoice.com.cn/ECG-R1/}{here}.
翻译:心电图(ECG)是临床实践中不可或缺的诊断工具,然而现有的多模态大语言模型(MLLMs)在心电图解读方面仍不可靠,常生成看似合理但临床错误的分析。为解决这一问题,我们提出ECG-R1,这是首个通过三项创新实现可靠心电图解读的推理型多模态大语言模型。首先,我们采用《协议引导的指令数据生成》方法构建解读语料库,将解读过程建立在可测量的心电图特征、专著定义的量化阈值及诊断逻辑之上。其次,我们提出一种模态解耦架构,结合《交错模态丢弃》技术,以提升在心电图信号或心电图图像缺失情况下的模型鲁棒性与跨模态一致性。第三,我们引入《基于心电图诊断证据奖励的强化学习》机制,以强化基于证据的心电图解读。此外,我们系统评估了专有、开源及医学领域的多模态大语言模型的心电图解读能力,并首次提供定量证据表明严重幻觉现象普遍存在,提示公众在未经独立验证前不应直接信任这些模型的输出。代码与数据已公开于\href{https://github.com/PKUDigitalHealth/ECG-R1}{此处},在线平台可通过\href{http://ai.heartvoice.com.cn/ECG-R1/}{此链接}访问。