Few-shot learning allows pre-trained language models to adapt to downstream tasks while using a limited number of training examples. However, practical applications are limited when all model parameters must be optimized. In this work we apply a new technique for parameter efficient few shot learning while adopting a strict definition of parameter efficiency. Our training method combines 1) intermediate training by reformulating natural language tasks as entailment tasks \cite{wang_entailment_2021} and 2) differentiable optimization of template and label tokens \cite{zhang_differentiable_2021}. We quantify the tradeoff between parameter efficiency and performance in the few-shot regime and propose a simple model agnostic approach that can be extended to any task By achieving competitive performance while only optimizing 3\% of a model's parameters and allowing for batched inference, we allow for more efficient practical deployment of models.
翻译:小样本学习使预训练语言模型能在使用有限训练样本的情况下适应下游任务。然而,当需要优化所有模型参数时,实际应用会受到限制。本文在采用严格参数效率定义的前提下,提出了一种参数高效小样本学习的新技术。我们的训练方法结合了:1) 通过将自然语言任务重构为蕴含任务进行中间训练,以及 2) 模板和标签标记的可微优化。我们量化了小样本场景下参数效率与性能之间的权衡,并提出了一种简单的与模型无关的方法,可推广至任何任务。通过仅优化模型3%的参数即可实现竞争性性能并支持批量推理,我们为模型更高效的实际部署提供了可能。