Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by integrating external knowledge bases. However, this integration introduces a new security threat: adversaries can exploit the retrieval mechanism to inject malicious content into the knowledge base, thereby influencing the generated responses. Based on this attack vector, we propose CtrlRAG, a novel attack method designed for RAG system in the black-box setting, which aligns with real-world scenarios. Unlike existing attack methods, CtrlRAG introduces a perturbation mechanism using Masked Language Model (MLM) to dynamically optimize malicious content in response to changes in the retrieved context. Experimental results demonstrate that CtrlRAG outperforms three baseline methods in both Emotional Manipulation and Hallucination Amplification objectives. Furthermore, we evaluate three existing defense mechanisms, revealing their limited effectiveness against CtrlRAG and underscoring the urgent need for more robust defenses.
翻译:检索增强生成(RAG)系统通过整合外部知识库来增强大语言模型(LLMs)的能力。然而,这种整合也引入了新的安全威胁:攻击者可以利用检索机制向知识库中注入恶意内容,从而影响生成的响应。基于此攻击向量,我们提出了CtrlRAG,一种专为符合现实场景的黑盒设置下的RAG系统设计的新型攻击方法。与现有攻击方法不同,CtrlRAG引入了一种利用掩码语言模型(MLM)的扰动机制,以动态优化恶意内容来响应检索上下文的变化。实验结果表明,在情感操纵和幻觉放大两个攻击目标上,CtrlRAG均优于三种基线方法。此外,我们评估了三种现有的防御机制,发现它们对CtrlRAG的防御效果有限,这突显了对更鲁棒防御机制的迫切需求。