Electrocardiogram (ECG) interpretation requires specialized expertise, often involving synthesizing insights from ECG signals with complex clinical queries posed in natural language. The scarcity of labeled ECG data coupled with the diverse nature of clinical inquiries presents a significant challenge for developing robust and adaptable ECG diagnostic systems. This work introduces a novel multimodal meta-learning method for few-shot ECG question answering, addressing the challenge of limited labeled data while leveraging the rich knowledge encoded within large language models (LLMs). Our LLM-agnostic approach integrates a pre-trained ECG encoder with a frozen LLM (e.g., LLaMA and Gemma) via a trainable fusion module, enabling the language model to reason about ECG data and generate clinically meaningful answers. Extensive experiments demonstrate superior generalization to unseen diagnostic tasks compared to supervised baselines, achieving notable performance even with limited ECG leads. For instance, in a 5-way 5-shot setting, our method using LLaMA-3.1-8B achieves accuracy of 84.6%, 77.3%, and 69.6% on single verify, choose and query question types, respectively. These results highlight the potential of our method to enhance clinical ECG interpretation by combining signal processing with the nuanced language understanding capabilities of LLMs, particularly in data-constrained scenarios.
翻译:心电图(ECG)解读需要专业知识,通常涉及将心电图信号中的洞察与自然语言提出的复杂临床问题相结合。标记心电图数据的稀缺性以及临床查询的多样性,对开发稳健且适应性强的心电图诊断系统构成了重大挑战。本研究提出了一种新颖的多模态元学习方法,用于少样本心电图问答,旨在应对标记数据有限的挑战,同时利用大型语言模型(LLMs)中编码的丰富知识。我们的方法不依赖于特定LLM,通过一个可训练的融合模块,将预训练的心电图编码器与冻结的LLM(例如LLaMA和Gemma)集成,使语言模型能够对心电图数据进行推理并生成具有临床意义的答案。大量实验表明,与监督基线相比,我们的方法在未见过的诊断任务上具有卓越的泛化能力,即使在有限心电图导联的情况下也能取得显著性能。例如,在5-way 5-shot设置下,我们使用LLaMA-3.1-8B的方法在单次验证、选择和查询问题类型上分别达到了84.6%、77.3%和69.6%的准确率。这些结果凸显了我们的方法通过将信号处理与LLMs细致的语言理解能力相结合,在提升临床心电图解读方面的潜力,尤其是在数据受限的场景中。