基于后门的检索增强生成中的数据提取攻击 (Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors)

Despite significant advancements, large language models (LLMs) still struggle with providing accurate answers when lacking domain-specific or up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge bases, but it also introduces new attack surfaces. In this paper, we investigate data extraction attacks targeting RAG's knowledge databases. We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs. As a result, they fail on models that are less responsive to such malicious prompts -- for example, our experiments show that state-of-the-art attacks achieve near-zero success on Gemma-2B-IT. Moreover, even for models that can follow these instructions, we found fine-tuning may significantly reduce attack performance. To further reveal the vulnerability, we propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM. When this compromised LLM is integrated into a RAG system, attackers can exploit specific triggers in prompts to manipulate the LLM to leak documents from the retrieval database. By carefully designing the poisoned data, we achieve both verbatim and paraphrased document extraction. For example, on Gemma-2B-IT, we show that with only 5\% poisoned data, our method achieves an average success rate of 94.1\% for verbatim extraction (ROUGE-L score: 82.1) and 63.6\% for paraphrased extraction (average ROUGE score: 66.4) across four datasets. These results underscore the privacy risks associated with the supply chain when deploying RAG systems.

翻译：尽管取得了显著进展，但大型语言模型在缺乏领域特定知识或最新知识时，仍难以提供准确答案。检索增强生成通过整合外部知识库来应对这一局限，但也引入了新的攻击面。本文研究了针对RAG知识数据库的数据提取攻击。我们发现，以往基于提示注入的提取攻击在很大程度上依赖于LLM的指令遵循能力。因此，它们在对此类恶意提示响应较弱的模型上会失效——例如，我们的实验表明，最先进的攻击在Gemma-2B-IT上的成功率接近零。此外，即使对于能够遵循这些指令的模型，我们发现微调也可能显著降低攻击性能。为了进一步揭示其脆弱性，我们提出对RAG进行后门攻击，即在微调阶段注入少量污染数据，从而在LLM内部创建后门。当这个被植入后门的LLM被集成到RAG系统中时，攻击者可以利用提示中的特定触发器操纵LLM，使其泄露检索数据库中的文档。通过精心设计污染数据，我们实现了逐字提取和转述提取。例如，在Gemma-2B-IT上，我们仅使用5%的污染数据，就在四个数据集上实现了平均94.1%的逐字提取成功率（ROUGE-L分数：82.1）和63.6%的转述提取成功率（平均ROUGE分数：66.4）。这些结果凸显了部署RAG系统时，供应链相关的隐私风险。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/