Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a Goldilocks zone in which mid-ranked documents are most vulnerable. These findings demonstrate practical risks and motivate future defenses for robust neural ranking.
翻译:神经排序模型在检索效能方面表现优异,但已有研究表明其易受对抗性扰动影响。本文通过一种最小化的查询感知攻击重新审视该鲁棒性问题:通过插入或替换单个语义对齐的词汇——即查询中心词——来提升目标文档的排序。我们研究了启发式与梯度引导的变体方法,包括一种识别关键插入位置的白盒方法。在TREC-DL 2019/2020数据集上,针对BERT与monoT5重排序器,我们的单字攻击在平均每文档修改少于两个词元的情况下实现了高达91%的成功率,在与PRADA保持可比的白盒评估框架中,以更少的编辑量获得了具有竞争力的排序提升与分数增益。我们还引入了新的诊断指标,以超越整体成功率的方式分析攻击敏感性。研究发现存在一个"适中区间",其中中等排名的文档最易受攻击。这些发现揭示了实际风险,并为构建鲁棒的神经排序防御机制提供了研究动机。