Discrete prompts have been used for fine-tuning Pre-trained Language Models for diverse NLP tasks. In particular, automatic methods that generate discrete prompts from a small set of training instances have reported superior performance. However, a closer look at the learnt prompts reveals that they contain noisy and counter-intuitive lexical constructs that would not be encountered in manually-written prompts. This raises an important yet understudied question regarding the robustness of automatically learnt discrete prompts when used in downstream tasks. To address this question, we conduct a systematic study of the robustness of discrete prompts by applying carefully designed perturbations into an application using AutoPrompt and then measure their performance in two Natural Language Inference (NLI) datasets. Our experimental results show that although the discrete prompt-based method remains relatively robust against perturbations to NLI inputs, they are highly sensitive to other types of perturbations such as shuffling and deletion of prompt tokens. Moreover, they generalize poorly across different NLI datasets. We hope our findings will inspire future work on robust discrete prompt learning.
翻译:离散提示已被用于微调预训练语言模型以处理多种自然语言处理任务。特别是,从少量训练实例中自动生成离散提示的方法报告了优越的性能。然而,仔细观察学习到的提示会发现,它们包含噪音和反直觉的词汇结构,这些结构在人工编写的提示中不会遇到。这引发了一个重要但尚未充分研究的问题:自动学习的离散提示在下游任务中使用的鲁棒性。为了解决这个问题,我们通过在使用AutoPrompt的应用中应用精心设计的扰动,系统研究了离散提示的鲁棒性,并在两个自然语言推理(NLI)数据集中测量其性能。我们的实验结果表明,尽管基于离散提示的方法对NLI输入的扰动保持相对鲁棒,但它们对提示令牌的混洗和删除等其他类型的扰动非常敏感。此外,它们在不同NLI数据集之间的泛化能力较差。我们希望我们的发现能激发未来关于鲁棒离散提示学习的工作。