RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on three medical VQA datasets, achieving an average improvement of 20.8% in factual accuracy. We publicly release our benchmark and code in https://github.com/richard-peng-xia/RULE.

翻译：近期出现的医学大型视觉语言模型（Med-LVLMs）提升了医疗诊断能力。然而，当前的Med-LVLMs常面临事实性问题，其生成的回答往往与既定的医学事实不符。检索增强生成（RAG）通过利用外部知识，能够提升这些模型的事实准确性，但引入了两大挑战。首先，有限的检索上下文可能无法覆盖全部必要信息，而过度的检索则会引入不相关且不准确的参考内容，干扰模型的生成过程。其次，在模型原本能正确回答的情况下，应用RAG可能导致对检索上下文的过度依赖，从而产生错误答案。为解决这些问题，我们提出了RULE框架，其包含两个核心组件。首先，我们引入一种可证明有效的策略，通过校准检索上下文数量来控制事实性风险。其次，基于因过度依赖检索上下文而导致错误的样本，我们构建了一个偏好数据集对模型进行微调，以平衡其在生成过程中对内在知识与检索上下文的依赖。我们在三个医学视觉问答数据集上验证了RULE的有效性，实现了事实准确性平均20.8%的提升。我们的基准测试集与代码已公开于https://github.com/richard-peng-xia/RULE。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日