Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments conducted on multiple large-scale datasets demonstrate that CorruptRAG achieves higher attack success rates than existing baselines.
翻译:大型语言模型(LLM)已展现出令人印象深刻的自然语言处理能力,但仍面临幻觉和知识陈旧等挑战。检索增强生成(RAG)作为缓解这些问题的前沿方法应运而生。尽管RAG提升了LLM的输出质量,其仍易受投毒攻击影响。近期研究表明,向知识库中注入投毒文本可破坏RAG系统,但现有攻击大多假设攻击者能为每个查询插入足够数量的投毒文本以在检索结果中压倒正确答案文本,这一假设往往不符合实际。为突破此限制,我们提出CorruptRAG——一种针对RAG系统的实用型投毒攻击,攻击者仅需注入单条投毒文本,显著提升了攻击可行性与隐蔽性。在多个大规模数据集上的广泛实验表明,CorruptRAG实现了比现有基线方法更高的攻击成功率。