Retrieval-Augmented Generation (RAG) systems improve the factual grounding of large language models (LLMs) but remain vulnerable to retrieval poisoning, where adversaries seed the corpus with manipulated content. Prior work largely evaluates this threat under a simplified single-attacker assumption. In practice, however, high-value or high-visibility queries attract multiple adversaries with conflicting objectives. Motivated by real cases, we introduce the setting of competing attacks, in which multiple attackers simultaneously attempt to steer the same or closely related query toward different targets. We formalize this threat model and propose competitive effectiveness, a metric that quantifies an attacker's advantage under competition. Extensive experiments show that many strategies that succeed in the single-attacker regime degrade markedly under competition, revealing performance inversions and highlighting the limits of conventional metrics such as attack success rate and F1. Furthermore, we present PoisonArena, a standardized framework and benchmark for evaluating poisoning attacks and defenses under realistic, multi-adversary conditions.
翻译:检索增强生成(RAG)系统提升了大型语言模型(LLMs)的事实依据能力,但仍易受到检索投毒攻击——攻击者向语料库注入篡改内容。现有研究主要在简化的单攻击者假设下评估该威胁。然而在实践中,高价值或高可见性查询会吸引具有冲突目标的多名攻击者。受真实案例启发,我们引入了竞争性攻击场景:多个攻击者同时试图将相同或高度相关的查询导向不同目标。通过形式化该威胁模型,我们提出竞争有效性指标,用于量化攻击者在竞争环境下的优势。大量实验表明,许多在单攻击者场景下成功的策略在竞争环境中性能显著下降,揭示了性能反转现象并突显了攻击成功率、F1等传统指标的局限性。此外,我们提出了PoisonArena——一个标准化的框架与基准测试,用于评估真实多攻击者条件下的投毒攻击与防御。