Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG) leverage both their rich parametric knowledge and the dynamic, external knowledge to excel in tasks such as Question Answering. While RAG enhances MLLMs by grounding responses in query-relevant external knowledge, this reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks, where misinformation or irrelevant knowledge is intentionally injected into external knowledge bases to manipulate model outputs to be incorrect and even harmful. To expose such vulnerabilities in multimodal RAG, we propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies: Localized Poisoning Attack (LPA), which injects query-specific misinformation in both text and images for targeted manipulation, and Globalized Poisoning Attack (GPA) to provide false guidance during MLLM generation to elicit nonsensical responses across all queries. We evaluate our attacks across multiple tasks, models, and access settings, demonstrating that LPA successfully manipulates the MLLM to generate attacker-controlled answers, with a success rate of up to 56% on MultiModalQA. Moreover, GPA completely disrupts model generation to 0% accuracy with just a single irrelevant knowledge injection. Our results highlight the urgent need for robust defenses against knowledge poisoning to safeguard multimodal RAG frameworks.
翻译:配备检索增强生成(RAG)的多模态大语言模型(MLLMs)能够同时利用其丰富的参数化知识与动态的外部知识,在问答等任务中表现出色。尽管RAG通过将响应建立在与查询相关的外部知识基础上增强了MLLMs,但这种依赖性也带来了一个关键但尚未被充分探究的安全风险:知识投毒攻击,即故意将错误信息或不相关知识注入外部知识库,以操纵模型输出错误甚至有害的内容。为揭示多模态RAG中的此类漏洞,我们提出了MM-PoisonRAG,一种新颖的知识投毒攻击框架,包含两种攻击策略:局部投毒攻击(LPA)——在文本和图像中注入针对特定查询的错误信息以实现定向操控;以及全局投毒攻击(GPA)——在MLLM生成过程中提供错误引导,以引发对所有查询的无意义响应。我们在多个任务、模型和访问设置下评估了我们的攻击,结果表明LPA成功操纵MLLM生成攻击者控制的答案,在MultiModalQA上的成功率高达56%。此外,GPA仅需注入一条不相关知识即可将模型生成准确率完全破坏至0%。我们的研究结果凸显了构建针对知识投毒的鲁棒防御机制以保障多模态RAG框架安全的迫切需求。