Text Adversarial Purification as Defense against Adversarial Attacks

Adversarial purification is a successful defense mechanism against adversarial attacks without requiring knowledge of the form of the incoming attack. Generally, adversarial purification aims to remove the adversarial perturbations therefore can make correct predictions based on the recovered clean samples. Despite the success of adversarial purification in the computer vision field that incorporates generative models such as energy-based models and diffusion models, using purification as a defense strategy against textual adversarial attacks is rarely explored. In this work, we introduce a novel adversarial purification method that focuses on defending against textual adversarial attacks. With the help of language models, we can inject noise by masking input texts and reconstructing the masked texts based on the masked language models. In this way, we construct an adversarial purification process for textual models against the most widely used word-substitution adversarial attacks. We test our proposed adversarial purification method on several strong adversarial attack methods including Textfooler and BERT-Attack and experimental results indicate that the purification algorithm can successfully defend against strong word-substitution attacks.

翻译：对抗净化是一种成功的对抗攻击防御机制，无需了解攻击形式即可发挥作用。通常，对抗净化的目标是去除对抗扰动，从而基于恢复的干净样本做出正确预测。尽管在计算机视觉领域，结合能量模型和扩散模型等生成模型的对抗净化方法取得了成功，但将净化作为防御文本对抗攻击的策略却鲜有探索。本文提出了一种新颖的对抗净化方法，专注于防御文本对抗攻击。借助语言模型，我们通过掩蔽输入文本并基于掩蔽语言模型重构掩蔽文本来注入噪声。通过这种方式，我们为文本模型构建了一个对抗净化过程，以抵御最常用的单词替换对抗攻击。我们在包括Textfooler和BERT-Attack在内的几种强对抗攻击方法上测试了所提出的对抗净化方法，实验结果表明该净化算法能够成功防御强单词替换攻击。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日