There are now many adversarial attacks for natural language processing systems. Of these, a vast majority achieve success by modifying individual document tokens, which we call here a token-modification attack. Each token-modification attack is defined by a specific combination of fundamental components, such as a constraint on the adversary or a particular search algorithm. Motivated by this observation, we survey existing token-modification attacks and extract the components of each. We use an attack-independent framework to structure our survey which results in an effective categorisation of the field and an easy comparison of components. This survey aims to guide new researchers to this field and spark further research into individual attack components.
翻译:目前存在多种针对自然语言处理系统的对抗攻击方法。其中,绝大多数攻击通过修改文档中的单个令牌实现成功,我们在此将其称为令牌修改攻击。每种令牌修改攻击均由特定基础组件的组合定义,例如对对手的约束条件或特定搜索算法。基于这一观察,我们对现有令牌修改攻击进行综述,并提取出每种攻击的组件构成。我们采用攻击无关框架组织本次综述,该框架有助于实现该领域的有效分类以及组件间的便捷比较。本综述旨在引导该领域的新研究者,并推动对单个攻击组件的进一步研究。