Retrieval-Augmented Generation (RAG) systems enhance response credibility and traceability by displaying reference contexts, but this transparency simultaneously introduces a novel black-box attack vector. Existing document poisoning attacks, where adversaries inject malicious documents into the knowledge base to manipulate RAG outputs, rely primarily on unrealistic white-box or gray-box assumptions, limiting their practical applicability. To address this gap, we propose CtrlRAG, a two-stage black-box attack that (1) constructs malicious documents containing misinformation or emotion-inducing content and injects them into the knowledge base, and (2) iteratively optimizes them using a localization algorithm and Masked Language Model (MLM) guided on reference context feedback, ensuring their retrieval priority while preserving linguistic naturalness. With only five malicious documents per target question injected into the million-document MS MARCO dataset, CtrlRAG achieves up to 90% attack success rates on commercial LLMs (e.g., GPT-4o), a 30% improvement over optimal baselines, in both *Emotion Manipulation* and *Hallucination Amplification* tasks. Furthermore, we show that existing defenses fail to balance security and performance. To mitigate this challenge, we introduce a dynamic *Knowledge Expansion* defense strategy based on *Parametric/Non-parametric Memory Confrontation*, blocking 78% of attacks while maintaining 95.5% system accuracy. Our findings reveal critical vulnerabilities in RAG systems and provide effective defense strategies.
翻译:检索增强生成(RAG)系统通过展示参考上下文来增强响应的可信度与可追溯性,但这种透明性同时引入了一种新型的黑盒攻击向量。现有的文档投毒攻击——即攻击者向知识库中注入恶意文档以操纵RAG输出——主要依赖于不切实际的白盒或灰盒假设,限制了其实际应用性。为弥补这一不足,我们提出CtrlRAG,一种两阶段黑盒攻击方法:(1)构建包含错误信息或情绪诱导内容的恶意文档并将其注入知识库;(2)基于参考上下文反馈,使用定位算法和掩码语言模型(MLM)进行迭代优化,在确保其检索优先级的同时保持语言自然性。在百万文档规模的MS MARCO数据集中,仅需为每个目标问题注入五篇恶意文档,CtrlRAG在商用大语言模型(如GPT-4o)上即可实现高达90%的攻击成功率,在*情绪操纵*与*幻觉放大*两类任务中均较最优基线提升30%。此外,我们发现现有防御方案难以兼顾安全性与性能。为应对这一挑战,我们提出一种基于*参数化/非参数化记忆对抗*的动态*知识扩展*防御策略,可阻断78%的攻击同时维持95.5%的系统准确率。本研究揭示了RAG系统的关键脆弱性,并提供了有效的防御策略。