Retrieval-Augmented Generation (RAG) mitigates LLM hallucinations but introduces a critical vulnerability: corpus integrity. We present SilentRetrieval, a two-stage data poisoning attack that hijacks RAG systems through adversarially crafted yet fluent documents. Stage 1 uses Coordinated Beam Search, a multi-token joint optimization method with a fluency-similarity objective, to keep a poisoned host document retrievable while constraining perplexity. Stage 2 uses Context-Adaptive Trigger Generation, a lightweight trigger-fusion step driven by a frozen LLM, to integrate manipulation triggers into document content. Under a one-poisoned-document-per-query evaluation with synthetic target answers, SilentRetrieval achieves 84.6%/81.3% HR@10 and 57.5%/54.8% ASR-LLM on Natural Questions and MS MARCO, while maintaining near-benign perplexity. Cross-model evaluation across four target LLMs shows nontrivial effectiveness under a fixed trigger generator, and transfer tests against unseen retrievers, including ColBERT and commercial embedding models, yield 64.7% average HR@10 under the same injected-corpus protocol. In a sampled Wikipedia-scale evaluation, SilentRetrieval retains 74.2% HR@10 at a 0.016% poisoning ratio. Combined retrieval-side and generation-side defenses reduce attack success substantially but incur a latency trade-off. Human evaluation shows substantially lower flag rates than disfluent baselines, while remaining numerically more suspicious than benign content at the current sample size.
翻译:检索增强生成(RAG)在缓解大语言模型幻觉的同时,引入了一个关键漏洞:语料库完整性。我们提出SilentRetrieval,一种两阶段数据投毒攻击方法,通过对抗性构造且语义流畅的文档劫持RAG系统。第一阶段采用协调束搜索(Coordinated Beam Search),这是一种基于流畅性-相似性目标的多元联合优化方法,在约束困惑度的同时保持中毒宿主文档的可检索性。第二阶段使用上下文自适应触发器生成(Context-Adaptive Trigger Generation),这是一种由冻结大语言模型驱动的轻量级触发器融合步骤,将操控触发器集成到文档内容中。在单中毒文档-单查询评估框架下,结合合成目标答案,SilentRetrieval在Natural Questions和MS MARCO数据集上分别达到84.6%/81.3%的HR@10和57.5%/54.8%的ASR-LLM,同时保持接近良性文档的困惑度。跨模型评估表明,在固定触发器生成器条件下,该攻击对四种目标大语言模型均具有显著有效性;针对包括ColBERT和商业嵌入模型在内的未知检索器进行的迁移测试中,在相同注入语料库协议下平均HR@10达到64.7%。在维基百科规模的采样评估中,SilentRetrieval以0.016%的投毒比例仍保持74.2%的HR@10。结合检索侧与生成侧的双重防御可显著降低攻击成功率,但会引入延迟权衡。人工评估显示,当前样本量下其标记率显著低于不流畅基线,但数值上仍比良性内容更易引发怀疑。