We propose Retrieval Augmented Generation (RAG) as an approach for automated radiology report writing that leverages multimodally aligned embeddings from a contrastively pretrained vision language model for retrieval of relevant candidate radiology text for an input radiology image and a general domain generative model like OpenAI text-davinci-003, gpt-3.5-turbo and gpt-4 for report generation using the relevant radiology text retrieved. This approach keeps hallucinated generations under check and provides capabilities to generate report content in the format we desire leveraging the instruction following capabilities of these generative models. Our approach achieves better clinical metrics with a BERTScore of 0.2865 ({\Delta}+ 25.88%) and Semb score of 0.4026 ({\Delta}+ 6.31%). Our approach can be broadly relevant for different clinical settings as it allows to augment the automated radiology report generation process with content relevant for that setting while also having the ability to inject user intents and requirements in the prompts as part of the report generation process to modulate the content and format of the generated reports as applicable for that clinical setting.
翻译:我们提出检索增强生成(RAG)方法用于自动化放射学报告生成,该方法利用来自对比预训练视觉语言模型的多模态对齐嵌入,为输入放射学图像检索相关候选放射学文本,并使用通用领域生成模型(如OpenAI text-davinci-003、gpt-3.5-turbo和gpt-4)基于检索到的相关放射学文本生成报告。该方法有效抑制了幻觉生成,并利用这些生成模型的指令遵循能力,以所需格式生成报告内容。我们的方法取得了更优的临床指标,BERTScore达到0.2865(变化率+25.88%),Semb得分达到0.4026(变化率+6.31%)。该方法可广泛适用于不同临床场景,因为它允许通过与该场景相关的内容增强自动化放射学报告生成过程,同时在报告生成过程中,能够将用户意图和要求注入提示中,以根据该临床场景的需求调整生成报告的内容和格式。