In the past few years, it has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data, leaving them vulnerable to attack. Various authors have proposed strong adversarial attacks for computer vision and Natural Language Processing (NLP) tasks. As a response, many defense mechanisms have also been proposed to prevent these networks from failing. The significance of defending neural networks against adversarial attacks lies in ensuring that the model's predictions remain unchanged even if the input data is perturbed. Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks such as text classification, named entity recognition, and natural language inference. Some of these methods not only defend neural networks against adversarial attacks but also act as a regularization mechanism during training, saving the model from overfitting. This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy. The survey also highlights the fragility of advanced deep neural networks in NLP and the challenges involved in defending them.
翻译:在过去几年中,越来越明显的是,深度神经网络不足以抵御输入数据中的对抗性扰动,使其易受攻击。众多研究者已针对计算机视觉和自然语言处理任务提出了强大的对抗攻击。作为回应,人们也提出了许多防御机制以防止这些网络失效。防御神经网络免受对抗攻击的重要性在于,确保即使输入数据受到扰动,模型的预测结果仍保持不变。目前已提出多种面向自然语言处理的对抗防御方法,适用于文本分类、命名实体识别和自然语言推理等不同任务。其中一些方法不仅能防御神经网络的对抗攻击,还能在训练过程中起到正则化作用,防止模型过拟合。本综述旨在通过引入一种新颖的分类体系,回顾过去几年中自然语言处理领域提出的各种对抗防御方法。同时,本文还揭示了自然语言处理中高级深度神经网络的脆弱性,以及在对其实施防御时所面临的挑战。