Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLM's output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.
翻译:检索增强生成(RAG)是一种极为有效的范式,能够保持基于大语言模型(LLM)的响应时效性并降低产生幻觉的可能性。然而,近期研究表明RAG极易受到语料知识投毒攻击:攻击者通过向语料库中注入误导性文档,从而将LLM的输出导向非预期结果。我们认为,LLM中标准的因果注意力机制在遭受攻击时会引发有害的跨文档交互。为此,我们提出一种针对RAG的新型防御方法:稀疏文档注意力RAG(SDAG)。这是一种块稀疏注意力机制,可禁止检索文档间的交叉注意力计算。SDAG仅需在推理时对注意力掩码进行最小化修改,且无需微调或额外的架构调整。我们通过基于LLM的问答(QA)任务,结合多种针对RAG的攻击策略进行了实证评估。实验表明,我们的SDAG方法在攻击成功率指标上显著优于标准因果注意力机制。我们进一步论证了将SDAG与前沿RAG防御方法相结合的优势:集成后的系统性能在统计学意义上显著优于现有最优方法。