Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.
翻译:近期研究表明,预训练语言模型具有引人注目的小样本学习能力:当通过少量标记数据(以提示形式构建)进行微调时,它们能快速适应新任务,无需大量任务特定标注数据。尽管性能可期,但现有仅从小训练集学习的小样本方法在效果上仍与全监督训练存在显著差距。本研究从全新视角探索基于预训练语言模型的小样本学习:首先在少量样本上微调自回归式预训练语言模型,然后将其作为生成器合成大量新训练样本以增强原始训练集。为促使生成器产生标签判别性样本,我们通过加权最大似然法训练模型,其中各词元的权重基于判别性元学习目标自动调整。随后,可在原始小样本与合成样本上联合微调分类式预训练语言模型,并通过正则化提升泛化能力与稳定性。所提方法FewGen在GLUE基准的七项分类任务中取得优于现有小样本方法的综合表现,相较无增强方法平均提升5个百分点,较增强方法平均提升3个百分点以上。