Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) is a widely adopted paradigm for enhancing LLMs in medical applications by incorporating expert multimodal knowledge during generation. However, the underlying retrieval databases may naturally contain, or be intentionally injected with, adversarial knowledge, which can perturb model outputs and undermine system reliability. To investigate this risk, prior studies have explored knowledge poisoning attacks in medical RAG systems. Nevertheless, most of them rely on the strong assumption that adversaries possess prior knowledge of user queries, which is unrealistic in deployments and substantially limits their practical applicability. In this paper, we propose M\textsuperscript{3}Att, a knowledge-poisoning framework designed for medical multimodal RAG systems, assuming only limited distribution knowledge of the underlying database. Our core idea is to inject covert misinformation into textual data while using paired visual data as a query-agnostic trigger to promote retrieval. We first propose a unified framework that introduces imperceptible perturbations to visual inputs to manipulate retrieval probabilities. Besides, due to the prior medical knowledge in LLMs, naively poisoned medical content with explicit factual errors can be corrected during generation. Thus, we leverage the inherent ambiguity of medical diagnosis and design a covert misinformation injection strategy that degrades diagnostic accuracy while evading model self-correction. Experiments on five LLMs and datasets demonstrate that M\textsuperscript{3}Att consistently produces clinically plausible yet incorrect generations. Codes: https://github.com/ypr17/M3Att.

翻译：检索增强生成（RAG）是一种广泛采用的范式，通过在生成过程中融入专家级多模态知识来增强医学应用中的大语言模型。然而，底层检索数据库可能天然包含或被人为注入对抗性知识，从而干扰模型输出并损害系统可靠性。为探究此类风险，先前研究已探索针对医学RAG系统的知识投毒攻击。但多数研究依赖一个强假设——攻击者预先知晓用户查询内容，这在部署场景中不切实际，严重限制了其实用性。本文提出M\textsuperscript{3}Att框架，一种专为医学多模态RAG系统设计的知识投毒方案，仅假设攻击者掌握底层数据库的有限分布信息。其核心思想是在文本数据中注入隐蔽虚假信息，同时利用配对的视觉数据作为与查询无关的触发器来促进检索。首先提出统一框架，通过向视觉输入施加不可感知扰动来操控检索概率。此外，由于大语言模型具备先验医学知识，包含显式事实错误的简单投毒内容可能在生成过程中被纠正。为此，本文利用医学诊断固有的模糊性，设计隐蔽虚假信息注入策略，在降低诊断准确性的同时规避模型自我纠正。在五个大语言模型和数据集上的实验表明，M\textsuperscript{3}Att持续生成临床合理但实质错误的输出。代码：https://github.com/ypr17/M3Att。