Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.
翻译:诸如GPT-3(Brown等人,2020)等大型语言模型,在仅需少量标注示例进行提示后,无需微调即可执行任意任务。任意任务可被重构为自然语言提示,语言模型被要求生成补全内容,从而间接执行任务,这一范式被称为基于提示的学习。迄今为止,基于提示的涌现学习能力主要在以单向语言模型上得到验证。然而,采用去噪目标(如掩码语言建模)预训练的双向语言模型,能够为迁移学习生成更强的学习表征。这激发了提示双向模型的可能性,但其预训练目标在很大程度上与现有提示范式不兼容。我们提出了SAP(序列自回归提示),一种能够实现双向模型提示的技术。以机器翻译任务作为案例研究,我们使用SAP提示双向mT5模型(Xue等人,2021),并证明其少样本和零样本翻译性能优于GPT-3和XGLM(Lin等人,2021)等单向模型的少样本翻译,尽管mT5的参数数量减少了约50%。我们进一步展示了SAP在问答和摘要任务上的有效性。我们的结果首次证明,基于提示的学习是更广泛语言模型类别(而非仅单向模型)的涌现属性。