Despite significant advancements, large language models (LLMs) still struggle with providing accurate answers when lacking domain-specific or up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge bases, but it also introduces new attack surfaces. In this paper, we investigate data extraction attacks targeting RAG's knowledge databases. We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs. As a result, they fail on models that are less responsive to such malicious prompts -- for example, our experiments show that state-of-the-art attacks achieve near-zero success on Gemma-2B-IT. Moreover, even for models that can follow these instructions, we found fine-tuning may significantly reduce attack performance. To further reveal the vulnerability, we propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM. When this compromised LLM is integrated into a RAG system, attackers can exploit specific triggers in prompts to manipulate the LLM to leak documents from the retrieval database. By carefully designing the poisoned data, we achieve both verbatim and paraphrased document extraction. For example, on Gemma-2B-IT, we show that with only 5\% poisoned data, our method achieves an average success rate of 94.1\% for verbatim extraction (ROUGE-L score: 82.1) and 63.6\% for paraphrased extraction (average ROUGE score: 66.4) across four datasets. These results underscore the privacy risks associated with the supply chain when deploying RAG systems.
翻译:尽管取得了显著进展,但大型语言模型在缺乏领域特定知识或最新知识时,仍难以提供准确答案。检索增强生成通过整合外部知识库来应对这一局限,但也引入了新的攻击面。本文研究了针对RAG知识数据库的数据提取攻击。我们发现,以往基于提示注入的提取攻击在很大程度上依赖于LLM的指令遵循能力。因此,它们在对此类恶意提示响应较弱的模型上会失效——例如,我们的实验表明,最先进的攻击在Gemma-2B-IT上的成功率接近零。此外,即使对于能够遵循这些指令的模型,我们发现微调也可能显著降低攻击性能。为了进一步揭示其脆弱性,我们提出对RAG进行后门攻击,即在微调阶段注入少量污染数据,从而在LLM内部创建后门。当这个被植入后门的LLM被集成到RAG系统中时,攻击者可以利用提示中的特定触发器操纵LLM,使其泄露检索数据库中的文档。通过精心设计污染数据,我们实现了逐字提取和转述提取。例如,在Gemma-2B-IT上,我们仅使用5%的污染数据,就在四个数据集上实现了平均94.1%的逐字提取成功率(ROUGE-L分数:82.1)和63.6%的转述提取成功率(平均ROUGE分数:66.4)。这些结果凸显了部署RAG系统时,供应链相关的隐私风险。