Radiology reports are crucial for planning treatment strategies and facilitating effective doctor-patient communication. However, the manual creation of these reports places a significant burden on radiologists. While automatic radiology report generation presents a promising solution, existing methods often rely on single-view radiographs, which constrain diagnostic accuracy. To address this challenge, we propose \textbf{EVOKE}, a novel chest X-ray report generation framework that incorporates multi-view contrastive learning and patient-specific knowledge. Specifically, we introduce a multi-view contrastive learning method that enhances visual representation by aligning multi-view radiographs with their corresponding report. After that, we present a knowledge-guided report generation module that integrates available patient-specific indications (e.g., symptom descriptions) to trigger the production of accurate and coherent radiology reports. To support research in multi-view report generation, we construct Multi-view CXR and Two-view CXR datasets using publicly available sources. Our proposed EVOKE surpasses recent state-of-the-art methods across multiple datasets, achieving a 2.9\% F\textsubscript{1} RadGraph improvement on MIMIC-CXR, a 7.3\% BLEU-1 improvement on MIMIC-ABN, a 3.1\% BLEU-4 improvement on Multi-view CXR, and an 8.2\% F\textsubscript{1,mic-14} CheXbert improvement on Two-view CXR.
翻译:放射学报告对于制定治疗策略和促进有效的医患沟通至关重要。然而,这些报告的手动撰写给放射科医生带来了沉重负担。虽然自动放射学报告生成提供了一种有前景的解决方案,但现有方法通常依赖于单视角X光片,这限制了诊断的准确性。为了应对这一挑战,我们提出了\textbf{EVOKE},一个新颖的胸部X光报告生成框架,它融合了多视角对比学习和患者特定知识。具体而言,我们引入了一种多视角对比学习方法,通过将多视角X光片与其对应报告进行对齐来增强视觉表征。随后,我们提出了一个知识引导的报告生成模块,该模块整合了可用的患者特定指征(例如症状描述),以触发生成准确且连贯的放射学报告。为了支持多视角报告生成的研究,我们利用公开可用的资源构建了Multi-view CXR和Two-view CXR数据集。我们提出的EVOKE在多个数据集上超越了近期最先进的方法,在MIMIC-CXR上实现了2.9\%的F\textsubscript{1} RadGraph提升,在MIMIC-ABN上实现了7.3\%的BLEU-1提升,在Multi-view CXR上实现了3.1\%的BLEU-4提升,在Two-view CXR上实现了8.2\%的F\textsubscript{1,mic-14} CheXbert提升。