Natural language processing models based on neural networks are vulnerable to adversarial examples. These adversarial examples are imperceptible to human readers but can mislead models to make the wrong predictions. In a black-box setting, attacker can fool the model without knowing model's parameters and architecture. Previous works on word-level attacks widely use single semantic space and greedy search as a search strategy. However, these methods fail to balance the attack success rate, quality of adversarial examples and time consumption. In this paper, we propose BeamAttack, a textual attack algorithm that makes use of mixed semantic spaces and improved beam search to craft high-quality adversarial examples. Extensive experiments demonstrate that BeamAttack can improve attack success rate while saving numerous queries and time, e.g., improving at most 7\% attack success rate than greedy search when attacking the examples from MR dataset. Compared with heuristic search, BeamAttack can save at most 85\% model queries and achieve a competitive attack success rate. The adversarial examples crafted by BeamAttack are highly transferable and can effectively improve model's robustness during adversarial training. Code is available at https://github.com/zhuhai-ustc/beamattack/tree/master
翻译:基于神经网络的自然语言处理模型易受对抗样本攻击。这些对抗样本对人类读者来说难以察觉,却能误导模型做出错误预测。在黑盒场景中,攻击者无需知晓模型参数和架构即可欺骗模型。以往的词级攻击方法广泛采用单一语义空间和贪心搜索作为搜索策略,但这些方法难以平衡攻击成功率、对抗样本质量与时间开销。本文提出BeamAttack文本攻击算法,该算法利用混合语义空间和改进的束搜索来生成高质量对抗样本。大量实验表明,BeamAttack在提升攻击成功率的同时能节省大量查询次数与时间:例如,在攻击MR数据集样本时,其攻击成功率相比贪心搜索最高提升7%。与启发式搜索相比,BeamAttack最多可节省85%的模型查询,同时保持具有竞争力的攻击成功率。基于BeamAttack生成的对抗样本具有强迁移性,并能通过对抗训练有效提升模型的鲁棒性。代码已开源在https://github.com/zhuhai-ustc/beamattack/tree/master。