Neural ranking models (NRMs) have undergone significant development and have become integral components of information retrieval (IR) systems. Unfortunately, recent research has unveiled the vulnerability of NRMs to adversarial document manipulations, potentially exploited by malicious search engine optimization practitioners. While progress in adversarial attack strategies aids in identifying the potential weaknesses of NRMs before their deployment, the defensive measures against such attacks, like the detection of adversarial documents, remain inadequately explored. To mitigate this gap, this paper establishes a benchmark dataset to facilitate the investigation of adversarial ranking defense and introduces two types of detection tasks for adversarial documents. A comprehensive investigation of the performance of several detection baselines is conducted, which involve examining the spamicity, perplexity, and linguistic acceptability, and utilizing supervised classifiers. Experimental results demonstrate that a supervised classifier can effectively mitigate known attacks, but it performs poorly against unseen attacks. Furthermore, such classifier should avoid using query text to prevent learning the classification on relevance, as it might lead to the inadvertent discarding of relevant documents.
翻译:神经排序模型(NRMs)已取得显著发展,并成为信息检索(IR)系统的核心组成部分。然而,最新研究揭示了NRMs易受对抗性文档操纵攻击的脆弱性,这些攻击可能被恶意搜索引擎优化从业者利用。尽管对抗攻击策略的进展有助于在NRMs部署前识别其潜在弱点,但针对此类攻击的防御措施(如对抗文档检测)仍缺乏充分探索。为弥补这一不足,本文构建了一个基准数据集以促进对抗排序防御研究,并引入了两类对抗文档检测任务。我们系统研究了多种检测基线的性能,包括分析垃圾内容特征、困惑度、语言可接受性,以及运用监督分类器。实验结果表明,监督分类器能有效缓解已知攻击,但对未知攻击表现欠佳。此外,该类分类器应避免使用查询文本以防止学习基于相关性的分类,因为这可能导致相关文档被误舍弃。