Presently, with the assistance of advanced LLM application development frameworks, more and more LLM-powered applications can effortlessly augment the LLMs' knowledge with external content using the retrieval augmented generation (RAG) technique. However, these frameworks' designs do not have sufficient consideration of the risk of external content, thereby allowing attackers to undermine the applications developed with these frameworks. In this paper, we reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process. Specifically, through the analysis of LLM application frameworks, attackers can craft documents visually indistinguishable from benign ones. Despite the documents providing correct information, once they are used as reference sources for RAG, the application is misled into generating incorrect responses. Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33\% success rate, and achieve a 66.67\% success rate in the real-world application, demonstrating the potential impact of retrieval poisoning.
翻译:当前,借助先进的大语言模型应用开发框架,越来越多的LLM驱动应用能够通过检索增强生成技术轻松扩展模型知识库。然而,这些框架的设计未充分考量外部内容风险,使得攻击者可利用框架开发的应用进行破坏。本文揭示了一种针对LLM驱动应用的新型威胁——检索投毒攻击,攻击者能够在RAG过程中引导应用生成恶意响应。具体而言,通过分析LLM应用框架,攻击者可构建视觉上与正常文档无异的恶意文档。尽管这些文档提供正确信息,但一旦被RAG作为参考源使用,应用就会被误导生成错误响应。初步实验表明,攻击者能以88.33%的成功率误导LLM,在真实应用场景中达到66.67%的成功率,充分展示了检索投毒攻击的潜在危害。