One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image

Retrieval-augmented generation (RAG) is instrumental for inhibiting hallucinations in large language models (LLMs) through the use of a factual knowledge base (KB). Although PDF documents are prominent sources of knowledge, text-based RAG pipelines are ineffective at capturing their rich multi-modal information. In contrast, visual document RAG (VD-RAG) uses screenshots of document pages as the KB, which has been shown to achieve state-of-the-art results. However, by introducing the image modality, VD-RAG introduces new attack vectors for adversaries to disrupt the system by injecting malicious documents into the KB. In this paper, we demonstrate the vulnerability of VD-RAG to poisoning attacks targeting both retrieval and generation. We define two attack objectives and demonstrate that both can be realized by injecting only a single adversarial image into the KB. Firstly, we introduce a targeted attack against one or a group of queries with the goal of spreading targeted disinformation. Secondly, we present a universal attack that, for any potential user query, influences the response to cause a denial-of-service in the VD-RAG system. We investigate the two attack objectives under both white-box and black-box assumptions, employing a multi-objective gradient-based optimization approach as well as prompting state-of-the-art generative models. Using two visual document datasets, a diverse set of state-of-the-art retrievers (embedding models) and generators (vision language models), we show VD-RAG is vulnerable to poisoning attacks in both the targeted and universal settings, yet demonstrating robustness to black-box attacks in the universal setting.

翻译：检索增强生成（RAG）通过引入事实知识库（KB），有助于抑制大型语言模型（LLM）的幻觉现象。尽管PDF文档是重要的知识来源，但基于文本的RAG流水线无法有效捕捉其丰富的多模态信息。相比之下，视觉文档RAG（VD-RAG）采用文档页面的截图作为知识库，已被证明可达到最先进的效果。然而，引入图像模态后，VD-RAG为攻击者提供了新的攻击向量，使其能够通过向知识库中注入恶意文档来破坏系统。本文展示了VD-RAG在针对检索与生成环节的投毒攻击中的脆弱性。我们定义了两种攻击目标，并证明仅需向知识库中注入一张对抗性图像即可实现这两种攻击。首先，我们提出针对单个或一组查询的定向攻击，旨在传播定向虚假信息。其次，我们提出一种通用攻击，该攻击针对任意潜在用户查询，通过影响响应结果导致VD-RAG系统陷入拒绝服务状态。我们在白盒与黑盒假设下对两种攻击目标展开研究，采用多目标梯度优化方法及提示最先进的生成模型。通过使用两个视觉文档数据集、一系列最先进的检索器（嵌入模型）与生成器（视觉语言模型），我们证明VD-RAG在定向与通用场景下均易受投毒攻击，但在通用场景下对黑盒攻击展现出鲁棒性。