Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. We also study multiple effects of RAG setup on the extractability of data, indicating that following unexpected instructions to regurgitate data can be an outcome of failure in effectively utilizing contexts for modern LMs, and further show that such vulnerability can be greatly mitigated by position bias elimination strategies. Extending our study to production RAG models GPTs, we design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41% from a book of 77,000 words and 3% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves.
翻译:检索增强生成(RAG)通过在测试时融入外部知识来改进预训练模型,从而实现定制化适应。本研究探讨了检索上下文RAG语言模型(LM)中数据存储泄露的风险。我们证明,攻击者可以利用LM的指令遵循能力,通过提示注入轻松地从基于指令调优LM构建的RAG系统的数据存储中逐字提取文本数据。该漏洞广泛存在于包括Llama2、Mistral/Mixtral、Vicuna、SOLAR、WizardLM、Qwen1.5和Platypus2在内的多种现代LM中,且模型规模越大,可利用性越强。我们还研究了RAG设置对数据可提取性的多重影响,表明遵循意外指令而泄露数据可能是现代LM未能有效利用上下文的结果,并进一步证明通过消除位置偏置策略可大幅缓解此类漏洞。将研究扩展到生产级RAG模型GPTs,我们设计了一种攻击方法:在随机选取的25个定制GPTs上仅需最多2次查询即可实现100%成功率的数据存储泄露;通过仅使用GPTs自身生成的100条查询提示,我们以41%的比率从77,000词的书籍中逐字提取文本数据,并以3%的比率从1,569,000词的语料库中提取数据。