Scaled Prompt-Tuning for Few-Shot Natural Language Generation

The increasingly Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities, while the memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. Besides, fine-tuning generally requires a certain amount of data from individual tasks whilst data collection cost is another issue to consider in real-world applications. In this work, we focus on Parameter-Efficient Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG), which freeze most parameters in LLMs and tune a small subset of parameters in few-shot cases so that memory footprint, training cost, and labeling cost are reduced while maintaining or even improving the performance. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability but without an obvious increase in training cost. Further study on intermediate SPT suggests the superior transferability of SPT in few-shot scenarios, providing a recipe for data-deficient and computation-limited circumstances. Moreover, a comprehensive comparison of existing PEFT methods reveals that certain approaches exhibiting decent performance with modest training cost such as Prefix-Tuning in prior study could struggle in few-shot NLG tasks, especially on challenging datasets.

翻译：随着大型语言模型规模不断增大，其语言理解与生成能力持续增强，但针对下游任务进行全参数微调所需的显存与计算成本不容忽视。此外，传统微调通常需要大量任务标注数据，而数据采集成本在现实应用中同样是需要考虑的重要问题。本文聚焦于少样本自然语言生成场景下的参数高效微调方法，通过冻结大语言模型中的大部分参数并仅微调少量参数，在保持甚至提升模型性能的同时，显著降低显存占用、训练成本与标注成本。我们提出缩放提示调优方法，该方法在未显著增加训练成本的前提下，在性能和泛化能力上均优于传统提示调优。进一步对中间缩放提示调优的研究表明，该方法在少样本场景下具有卓越的迁移能力，为数据匮乏和计算资源受限场景提供解决方案。此外，通过对现有参数高效微调方法的全面比较，我们发现某些在先前研究中表现优异且训练成本适中的方法（如前缀调优）可能在少样本自然语言生成任务中表现不佳，尤其在具有挑战性的数据集上。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日