Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.

翻译：近期研究表明，预训练语言模型具有引人注目的小样本学习能力：当通过少量标记数据（以提示形式构建）进行微调时，它们能快速适应新任务，无需大量任务特定标注数据。尽管性能可期，但现有仅从小训练集学习的小样本方法在效果上仍与全监督训练存在显著差距。本研究从全新视角探索基于预训练语言模型的小样本学习：首先在少量样本上微调自回归式预训练语言模型，然后将其作为生成器合成大量新训练样本以增强原始训练集。为促使生成器产生标签判别性样本，我们通过加权最大似然法训练模型，其中各词元的权重基于判别性元学习目标自动调整。随后，可在原始小样本与合成样本上联合微调分类式预训练语言模型，并通过正则化提升泛化能力与稳定性。所提方法FewGen在GLUE基准的七项分类任务中取得优于现有小样本方法的综合表现，相较无增强方法平均提升5个百分点，较增强方法平均提升3个百分点以上。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日