In recent years, it has been seen that deep neural networks are lacking robustness and are vulnerable in case of adversarial perturbations in input data. Strong adversarial attacks are proposed by various authors for tasks under computer vision and Natural Language Processing (NLP). As a counter-effort, several defense mechanisms are also proposed to save these networks from failing. Defending the neural networks from adversarial attacks has its own importance, where the goal is to ensure that the model's prediction doesn't change if input data is perturbed. Numerous methods for adversarial defense in NLP are proposed of late, for different NLP tasks such as text classification, named entity recognition, natural language inferencing, etc. Some of these methods are not just used for defending neural networks from adversarial attacks, but also used as a regularization mechanism during training, saving the model from overfitting. The proposed survey is an attempt to review different methods proposed for adversarial defenses in NLP in recent years by proposing a novel taxonomy. This survey also highlights the fragility of the advanced deep neural networks in NLP and the challenges in defending them.
翻译:近年来,人们发现深度神经网络缺乏鲁棒性,且易受输入数据中对抗性扰动的影响。针对计算机视觉和自然语言处理(NLP)任务,多位研究者提出了强对抗攻击。作为反制措施,多种防御机制也被提出,以防止这些网络失效。防御神经网络免受对抗性攻击具有重要意义,其目标是确保模型预测在输入数据受扰动时保持不变。近期,针对不同NLP任务(如文本分类、命名实体识别、自然语言推理等),研究者提出了多种对抗防御方法。其中一些方法不仅用于防御神经网络免受对抗性攻击,还作为训练时的正则化机制,防止模型过拟合。本文综述试图通过提出一种新型分类方法,对近年来NLP领域提出的不同对抗防御方法进行回顾。同时,本综述也强调了当前NLP领域先进深度神经网络的脆弱性及对其进行防御所面临的挑战。