Inference Cost Attacks for Retrieval-Augmented Large Language Models

Retrieval-Augmented Generation (RAG)-enhanced LLM systems, while powerful, introduce substantial inference costs due to the inclusion of an extra multi-stage pipeline that dynamically retrieves and synthesizes information from external knowledge sources. This high operational cost exposes a critical vulnerability to Inference Cost Attacks (ICAs). However, existing ICAs often rely on the impractical assumption of direct prompt manipulation. We argue that a more feasible and potent threat to RAG-enhanced LLM systems arises from poisoning external knowledge bases (e.g., web knowledge from the Internet). In this work, we introduce the Retrieval-Augmented Inference Cost Attack (RA-ICA), a novel attacking paradigm that targets the computational cost of RAG-enhanced LLM systems by injecting malicious documents into external knowledge corpus. To operationalize this attack, we propose Computational Resource Exhaustion via External Poisoning (CREEP), a novel framework that leverages LLM agents to automatically craft malicious documents that are both semantically relevant for retrieval and potent for inducing an abnormal increase in token consumption during the inference phase. To enhance the attack's effectiveness, we introduce Memory-Augmented Group Relative Policy Optimization (MA-GRPO), a novel reinforcement learning algorithm that fine-tunes the agents by learning from a dynamic memory of historical best adversarial documents. Extensive experiments across three real-world datasets demonstrate that RA-ICA increases token consumption by up to 13.12 times with an over 90% success rate, without degrading the integrity of the generated answer.

翻译：检索增强生成（RAG）增强的大语言模型系统虽然功能强大，但由于引入了额外的多阶段流水线（该流水线会动态地从外部知识源检索并综合信息），导致推理成本显著增加。这种高昂的运行成本暴露了一个关键漏洞，即推理成本攻击（ICA）。然而，现有的ICA通常依赖于直接操作提示这一不切实际的假设。我们认为，对RAG增强的大语言模型系统更可行且更具威胁性的攻击来自对外部知识库（例如互联网上的网络知识）的投毒。在这项工作中，我们引入了检索增强推理成本攻击（RA-ICA），这是一种新颖的攻击范式，旨在通过向外部知识语料库注入恶意文档来针对RAG增强的大语言模型系统的计算成本。为了实现这种攻击，我们提出了通过外部投毒耗尽计算资源（CREEP），这是一种利用大语言模型代理自动生成恶意文档的新颖框架，这些文档既在语义上与检索相关，又能在推理阶段有效诱导令牌消耗异常增加。为增强攻击效果，我们引入了记忆增强组相对策略优化（MA-GRPO），这是一种新颖的强化学习算法，通过从历史最佳对抗性文档的动态记忆中学习来微调代理。在三个真实世界数据集上的大量实验表明，RA-ICA在保持生成答案完整性的前提下，令牌消耗最多增加13.12倍，成功率超过90%。