LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation

In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that these latent features ought to be high-dimensional vectors which require model fine tuning to handle. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a text-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG), where the task is to generate a clinician's report detailing their observations from a set of radiological images, such as X-rays. We argue that simple linear classifiers over extracted image embeddings can effectively transform X-rays into text-space as radiology-specific labels. In combination with standard RAG, we show that these derived text labels can be used with general-domain LLMs to generate radiology reports. Without ever training our generative language model or image feature encoder models, and without ever directly "showing" the LLM an X-ray, we demonstrate that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods, while attaining competitive results compared to other fine-tuned vision-language RRG models. We further present results of our experiments with various components of LaB-RAG to better understand our method. Finally, we critique the use of a popular RRG metric, arguing it is possible to artificially inflate its results without true data-leakage.

翻译：在当前图像描述范式中，深度学习模型通过训练从潜在特征的图像嵌入生成文本。我们质疑了这些潜在特征必须是高维向量且需要模型微调来处理的假设。本文提出标签增强检索增强生成（LaB-RAG），这是一种基于文本的图像描述方法，利用分类标签形式的图像描述符来增强基于预训练大语言模型（LLM）的标准检索增强生成（RAG）。我们在放射学报告生成（RRG）背景下研究该方法，该任务旨在生成临床医生根据一组放射影像（如X光片）记录观察结果的报告。我们认为，在提取的图像嵌入上使用简单的线性分类器，可以有效地将X光片转化为放射学特定标签的文本空间表示。结合标准RAG，我们证明这些衍生的文本标签可与通用领域LLM结合生成放射学报告。在从未训练生成语言模型或图像特征编码器模型，且从未直接向LLM“展示”X光片的情况下，我们证明LaB-RAG在自然语言和放射学语言指标上优于其他基于检索的RRG方法，同时与经过微调的视觉-语言RRG模型相比获得具有竞争力的结果。我们进一步展示了LaB-RAG各组件实验的结果，以更好地理解该方法。最后，我们对常用RRG评估指标的使用提出批判，论证了在不存在真实数据泄露的情况下仍可能人为夸大其评估结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日