State-of-the-art few-shot learning (FSL) methods leverage prompt-based fine-tuning to obtain remarkable results for natural language understanding (NLU) tasks. While much of the prior FSL methods focus on improving downstream task performance, there is a limited understanding of the adversarial robustness of such methods. In this work, we conduct an extensive study of several state-of-the-art FSL methods to assess their robustness to adversarial perturbations. To better understand the impact of various factors towards robustness (or the lack of it), we evaluate prompt-based FSL methods against fully fine-tuned models for aspects such as the use of unlabeled data, multiple prompts, number of few-shot examples, model size and type. Our results on six GLUE tasks indicate that compared to fully fine-tuned models, vanilla FSL methods lead to a notable relative drop in task performance (i.e., are less robust) in the face of adversarial perturbations. However, using (i) unlabeled data for prompt-based FSL and (ii) multiple prompts flip the trend. We further demonstrate that increasing the number of few-shot examples and model size lead to increased adversarial robustness of vanilla FSL methods. Broadly, our work sheds light on the adversarial robustness evaluation of prompt-based FSL methods for NLU tasks.
翻译:先进的少样本学习方法利用基于提示的微调,在自然语言理解任务中取得了显著成果。尽管以往少样本学习方法主要关注提升下游任务性能,但对此类方法对抗鲁棒性的理解仍十分有限。本文广泛研究了多种先进少样本学习方法,评估其对对抗扰动的鲁棒性。为深入理解影响鲁棒性(或缺乏鲁棒性)的各种因素,我们针对未标注数据的使用、多提示策略、少样本数量、模型规模及类型等维度,将基于提示的少样本方法与全参数微调模型进行对比。在六个GLUE任务上的实验结果表明,与全参数微调模型相比,原始少样本方法在面临对抗扰动时任务性能显著下降(即鲁棒性较差)。然而,采用(i)未标注数据用于基于提示的少样本学习,以及(ii)多提示策略可逆转这一趋势。我们进一步证明,增加少样本示例数量与模型规模可提升原始少样本方法的对抗鲁棒性。总体而言,本研究为自然语言理解任务中基于提示的少样本方法的对抗鲁棒性评估提供了深入见解。