Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often apply aggressive filtering, leading to unnecessary loss of valuable information and reduced reliability in generation. To address this problem, we propose a two-stage semantic filtering and conflict-free framework for trustworthy RAG. In the first stage, we perform a joint filter with semantic and cluster-based filtering which is guided by the Entity-intent-relation extractor (EIRE). EIRE extracts entities, latent objectives, and entity relations from both the user query and filtered documents, scores their semantic relevance, and selectively adds valuable documents into the clean retrieval database. In the second stage, we proposed an EIRE-guided conflict-aware filtering module, which analyzes semantic consistency between the query, candidate answers, and retrieved knowledge before final answer generation, filtering out internal and external contradictions that could mislead the model. Through this two-stage process, SeCon-RAG effectively preserves useful knowledge while mitigating conflict contamination, achieving significant improvements in both generation robustness and output trustworthiness. Extensive experiments across various LLMs and datasets demonstrate that the proposed SeCon-RAG markedly outperforms state-of-the-art defense methods.
翻译:检索增强生成(RAG)系统通过外部知识增强大语言模型(LLMs)的能力,但容易受到语料库投毒和污染攻击,从而损害输出完整性。现有防御方法通常采用激进的过滤策略,导致有价值信息的不必要丢失,并降低生成的可靠性。为解决此问题,我们提出了一种用于可信RAG的两阶段语义过滤与冲突消除框架。在第一阶段,我们执行一种由实体-意图-关系提取器(EIRE)引导的联合过滤,结合语义过滤和基于聚类的过滤。EIRE从用户查询和过滤后的文档中提取实体、潜在目标及实体关系,对其语义相关性进行评分,并有选择地将有价值的文档添加到干净的检索数据库中。在第二阶段,我们提出了一个EIRE引导的冲突感知过滤模块,该模块在最终答案生成前分析查询、候选答案与检索知识之间的语义一致性,过滤掉可能误导模型的内部和外部矛盾。通过这个两阶段过程,SeCon-RAG在有效保留有用知识的同时,缓解了冲突污染,在生成鲁棒性和输出可信度方面均取得了显著提升。在多种LLMs和数据集上进行的大量实验表明,所提出的SeCon-RAG明显优于现有的先进防御方法。