While large language models (LLMs) have achieved notable success in generative tasks, they still face limitations, such as lacking up-to-date knowledge and producing hallucinations. Retrieval-Augmented Generation (RAG) enhances LLM performance by integrating external knowledge bases, providing additional context which significantly improves accuracy and knowledge coverage. However, building these external knowledge bases often requires substantial resources and may involve sensitive information. In this paper, we propose an agent-based automated privacy attack called RAG-Thief, which can extract a scalable amount of private data from the private database used in RAG applications. We conduct a systematic study on the privacy risks associated with RAG applications, revealing that the vulnerability of LLMs makes the private knowledge bases suffer significant privacy risks. Unlike previous manual attacks which rely on traditional prompt injection techniques, RAG-Thief starts with an initial adversarial query and learns from model responses, progressively generating new queries to extract as many chunks from the knowledge base as possible. Experimental results show that our RAG-Thief can extract over 70% information from the private knowledge bases within customized RAG applications deployed on local machines and real-world platforms, including OpenAI's GPTs and ByteDance's Coze. Our findings highlight the privacy vulnerabilities in current RAG applications and underscore the pressing need for stronger safeguards.
翻译:尽管大型语言模型(LLM)在生成任务中取得了显著成功,但仍面临一些局限性,例如缺乏最新知识和产生幻觉。检索增强生成(RAG)通过集成外部知识库来增强LLM性能,提供额外的上下文,从而显著提高准确性和知识覆盖范围。然而,构建这些外部知识库通常需要大量资源,并且可能涉及敏感信息。本文提出了一种基于智能体的自动化隐私攻击方法RAG-Thief,能够从RAG应用所使用的私有数据库中可扩展地提取大量私有数据。我们对RAG应用相关的隐私风险进行了系统性研究,揭示了LLM的脆弱性使得私有知识库面临重大的隐私风险。与以往依赖传统提示注入技术的手动攻击不同,RAG-Thief从一个初始对抗性查询开始,通过从模型响应中学习,逐步生成新的查询,以尽可能多地从知识库中提取数据块。实验结果表明,我们的RAG-Thief能够从部署在本地机器和真实世界平台(包括OpenAI的GPTs和字节跳动的Coze)上的定制RAG应用中,提取私有知识库中超过70%的信息。我们的研究结果凸显了当前RAG应用中存在的隐私漏洞,并强调了加强安全防护的迫切需求。