Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images is more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
翻译:心电图(ECG)的自动解读随着机器学习方法的进步而备受关注。尽管兴趣日益增长,当前大多数研究仅聚焦于分类或回归任务,忽略了临床心脏病诊断的关键环节:由经验丰富的临床医生撰写的诊断报告。本文提出了一种新颖的心电图解读方法,利用了大型语言模型(LLM)和视觉Transformer(ViT)模型的最新突破。我们并非将心电图诊断视为分类或回归任务,而是提出了一种替代方法,基于输入的心电图数据自动识别最相似的临床病例。此外,由于将心电图解读为图像更为经济且易获取,我们将其处理为编码图像,并采用视觉-语言学习范式,共同学习编码心电图图像与心电图诊断报告之间的视觉-语言对齐。将心电图编码为图像可构建高效的心电图检索系统,这在临床应用中具有高度的实用性和价值。更重要的是,我们的发现可为欠发达地区提供诊断服务的关键资源。