Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest in automated ECG interpretation using machine learning, most current studies focus solely on classification or regression tasks and overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images are more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in regions where only paper-printed ECG images are accessible due to past underdevelopment.
翻译:自动化心电图(ECG)解读随着机器学习方法的进步而备受关注。尽管基于机器学习的ECG自动解读研究日益增多,但现有研究大多局限于分类或回归任务,忽略了临床心脏病诊断中的关键环节:经验丰富的临床医生撰写的诊断报告。本文提出一种新颖的ECG解读方法,利用大语言模型(LLMs)和视觉Transformer(ViT)模型的最新突破。不同于将ECG诊断视为分类或回归任务,我们提出基于输入ECG数据自动识别最相似临床病例的替代方法。此外,鉴于将ECG作为图像解读更具成本效益和可及性,我们将ECG处理为编码图像,采用视觉-语言学习范式联合学习编码ECG图像与ECG诊断报告之间的视觉-语言对齐。将ECG编码为图像可实现高效的ECG检索系统,这在临床应用中极具实用价值。更重要的是,我们的研究成果可为因历史发展不足而仅能获取纸质打印ECG图像的地区提供诊断服务的关键资源。