Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems to retrieve them for queries that were not seen by the attacker. More surprisingly, these adversarial passages can directly generalize to out-of-domain queries and corpora with a high success attack rate -- for instance, we find that 50 generated passages optimized on Natural Questions can mislead >94% of questions posed in financial documents or online forums. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised. Although different systems exhibit varying levels of vulnerability, we show they can all be successfully attacked by injecting up to 500 passages, a small fraction compared to a retrieval corpus of millions of passages.
翻译:密集检索器在各种信息检索任务中已取得最先进的性能,但它们在现实应用中的安全部署程度如何?本研究提出了一种针对密集检索系统的新型攻击方法:恶意用户通过扰动离散令牌生成少量对抗性段落,使其与提供的训练查询集实现最大相似度。实验表明,当这些对抗性段落被插入大型检索语料库后,该攻击能高效欺骗系统——对于攻击者未见过的查询,检索器仍会优先返回这些段落。更令人惊讶的是,这些对抗性段落可直接泛化至域外查询和语料库,并保持高攻击成功率(例如,基于Natural Questions优化的50个生成段落,可误导金融文档或在线论坛中超过94%的提问)。我们同时基准测试并比较了多种最先进的密集检索器(包括无监督与有监督方法)。尽管不同系统表现出差异化的脆弱性,但结果显示:通过注入500个段落(相较于百万级检索语料库仅占极小比例),所有系统均可被成功攻击。