Many adversarial attacks target natural language processing systems, most of which succeed through modifying the individual tokens of a document. Despite the apparent uniqueness of each of these attacks, fundamentally they are simply a distinct configuration of four components: a goal function, allowable transformations, a search method, and constraints. In this survey, we systematically present the different components used throughout the literature, using an attack-independent framework which allows for easy comparison and categorisation of components. Our work aims to serve as a comprehensive guide for newcomers to the field and to spark targeted research into refining the individual attack components.
翻译:许多对抗攻击针对自然语言处理系统,其中大多数通过修改文档的单个标记来实现。尽管每种攻击看似独特,但本质上它们仅由四个组件的不同配置构成:目标函数、允许的变换、搜索方法和约束条件。本综述系统性地梳理了文献中使用的不同组件,采用一种攻击无关的框架,便于组件的比较与分类。我们的工作旨在为该领域的新手提供全面指南,并推动针对单个攻击组件的精细化研究。