Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they have recently been shown to be vulnerable to corpus poisoning attacks in which a malicious user injects a small fraction of adversarial passages into the retrieval corpus to trick the system into returning these passages among the top-ranked results for a broad set of user queries. Further study is needed to understand the extent to which these attacks could limit the deployment of dense retrievers in real-world applications. In this work, we propose Approximate Greedy Gradient Descent (AGGD), a new attack on dense retrieval systems based on the widely used HotFlip method for efficiently generating adversarial passages. We demonstrate that AGGD can select a higher quality set of token-level perturbations than HotFlip by replacing its random token sampling with a more structured search. Experimentally, we show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains. Notably, our method is extremely effective in attacking the ANCE retrieval model, achieving attack success rates that are 15.24\% and 17.44\% higher on the NQ and MS MARCO datasets, respectively, compared to HotFlip. Additionally, we demonstrate AGGD's potential to replace HotFlip in other adversarial attacks, such as knowledge poisoning of RAG systems.
翻译:密集检索器在信息检索领域被广泛使用,并已成功扩展至其他知识密集型领域,例如语言模型中的检索增强生成(RAG)系统。然而,近期研究表明它们易受语料库投毒攻击的威胁:恶意用户向检索语料库中注入少量对抗性段落,即可诱使系统在针对广泛用户查询返回的 top 排名结果中混入这些段落。为评估此类攻击对密集检索器在实际应用中部署的限制程度,仍需进一步深入研究。本文提出近似贪婪梯度下降(AGGD)——一种基于广泛使用的 HotFlip 方法的新型密集检索系统攻击技术,通过高效生成对抗性段落实现攻击。我们证明,AGGD 通过以结构化搜索替代随机词元采样,能够比 HotFlip 选择更高质量的词元级扰动集合。实验表明,该方法在多个数据集及多种检索器上均能实现较高的攻击成功率,且可泛化至未见查询与新领域。值得注意的是,本方法对 ANCE 检索模型的攻击效果尤为显著:在 NQ 和 MS MARCO 数据集上的攻击成功率分别比 HotFlip 高出 15.24% 和 17.44%。此外,我们验证了 AGGD 在其他对抗攻击中替代 HotFlip 的潜力,例如 RAG 系统的知识投毒攻击。