Large-scale pre-trained language models have shown outstanding performance in a variety of NLP tasks. However, they are also known to be significantly brittle against specifically crafted adversarial examples, leading to increasing interest in probing the adversarial robustness of NLP systems. We introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked inference (MI) to improve the adversarial robustness of NLP systems. RS transforms a classifier into a smoothed classifier to obtain robust representations, whereas MI forces a model to exploit the surrounding context of a masked token in an input sequence. RSMI improves adversarial robustness by 2 to 3 times over existing state-of-the-art methods on benchmark datasets. We also perform in-depth qualitative analysis to validate the effectiveness of the different stages of RSMI and probe the impact of its components through extensive ablations. By empirically proving the stability of RSMI, we put it forward as a practical method to robustly train large-scale NLP models. Our code and datasets are available at https://github.com/Han8931/rsmi_nlp
翻译:大规模预训练语言模型在各种自然语言处理任务中展现了出色的性能。然而,这些模型在面对精心设计的对抗样本时表现出显著的脆弱性,这促使人们越来越关注自然语言处理系统的对抗鲁棒性研究。我们提出RSMI,一种结合随机平滑(RS)与掩码推断(MI)的新型两阶段框架,用于提升自然语言处理系统的对抗鲁棒性。RS通过将分类器转化为平滑分类器来获取鲁棒表示,而MI则迫使模型利用输入序列中掩码令牌的上下文信息。在基准数据集上,RSMI将对抗鲁棒性较现有最优方法提升了2到3倍。我们还通过深入的定性分析验证了RSMI不同阶段的有效性,并通过大量消融实验探究各组件的影响。通过实证证明RSMI的稳定性,我们将其作为鲁棒训练大规模自然语言处理模型的实用方法加以推广。我们的代码与数据集已开源至https://github.com/Han8931/rsmi_nlp