Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives of robustness for LMs have been studied independently, but lacking a unified consideration in multiple perspectives. In this paper, we propose Robustifying LMs via Adversarial perturbation with Selective Training (RoAST), a simple yet effective fine-tuning technique to enhance the multi-perspective robustness of LMs in a unified way. RoAST effectively incorporates two important sources for the model robustness, robustness on the perturbed inputs and generalizable knowledge in pre-trained LMs. To be specific, RoAST introduces adversarial perturbation during fine-tuning while the model parameters are selectively updated upon their relative importance to minimize unnecessary deviation. Under a unified evaluation of fine-tuned LMs by incorporating four representative perspectives of model robustness, we demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs, which indicates its usefulness in practice.
翻译:微调预训练语言模型已成为诸多自然语言处理任务的标准做法。然而,微调后的语言模型仍存在鲁棒性问题,例如对抗鲁棒性和模型校准。已有研究从不同视角独立探索语言模型的鲁棒性,但缺乏多视角的统一考量。本文提出一种简单而有效的微调技术——通过对抗扰动与选择性训练提升语言模型的鲁棒性(RoAST),以统一方式增强语言模型的多视角鲁棒性。RoAST有效整合了提升模型鲁棒性的两个重要来源:对扰动输入的鲁棒性以及预训练语言模型中的可泛化知识。具体而言,RoAST在微调过程中引入对抗扰动,同时根据模型参数的重要性选择性更新参数,以最小化不必要的偏差。通过整合四个代表性鲁棒性视角对微调语言模型进行统一评估,我们在六种不同类型的语言模型上验证了RoAST相较于最先进微调方法的有效性,表明其在实际应用中的价值。