Adversarial attacks have gained traction in order to identify potential vulnerabilities in neural ranking models (NRMs), but current attack methods often introduce grammatical errors, nonsensical expressions, or incoherent text fragments, which can be easily detected. Additionally, current methods rely heavily on the use of a well-imitated surrogate NRM to guarantee the attack effect, which makes them difficult to use in practice. To address these issues, we propose a framework called Imperceptible DocumEnt Manipulation (IDEM) to produce adversarial documents that are less noticeable to both algorithms and humans. IDEM instructs a well-established generative language model, such as BART, to generate connection sentences without introducing easy-to-detect errors, and employs a separate position-wise merging strategy to balance relevance and coherence of the perturbed text. Experimental results on the popular MS MARCO benchmark demonstrate that IDEM can outperform strong baselines while preserving fluency and correctness of the target documents as evidenced by automatic and human evaluations. Furthermore, the separation of adversarial text generation from the surrogate NRM makes IDEM more robust and less affected by the quality of the surrogate NRM.
翻译:对抗性攻击已被用于识别神经排序模型(NRM)的潜在漏洞,但当前攻击方法常引入语法错误、无意义表达或不连贯文本片段,极易被察觉。此外,现有方法严重依赖良好模仿的替代NRM来保证攻击效果,导致其在实际应用中难以使用。为解决这些问题,我们提出名为“难以察觉的文档操纵”(IDEM)的框架,用于生成对算法和人类均不易察觉的对抗性文档。IDEM引导成熟的生成式语言模型(如BART)生成连接句子,避免引入易检测错误,并采用独立的逐位合并策略来平衡扰动文本的相关性与连贯性。在广泛使用的MS MARCO基准上的实验结果表明,IDEM能超越强基线方法,同时通过自动评估和人工评估证明其保持了目标文档的流畅性与正确性。此外,将对抗性文本生成与替代NRM分离使IDEM更鲁棒,更少受替代NRM质量的影响。