The increasingly Large Language Models (LLMs) demonstrate stronger language understanding and generation capabilities, while the memory demand and computation cost of fine-tuning LLMs on downstream tasks are non-negligible. Besides, fine-tuning generally requires a certain amount of data from individual tasks whilst data collection cost is another issue to consider in real-world applications. In this work, we focus on Parameter-Efficient Fine-Tuning (PEFT) methods for few-shot Natural Language Generation (NLG), which freeze most parameters in LLMs and tune a small subset of parameters in few-shot cases so that memory footprint, training cost, and labeling cost are reduced while maintaining or even improving the performance. We propose a Scaled Prompt-Tuning (SPT) method which surpasses conventional PT with better performance and generalization ability but without an obvious increase in training cost. Further study on intermediate SPT suggests the superior transferability of SPT in few-shot scenarios, providing a recipe for data-deficient and computation-limited circumstances. Moreover, a comprehensive comparison of existing PEFT methods reveals that certain approaches exhibiting decent performance with modest training cost such as Prefix-Tuning in prior study could struggle in few-shot NLG tasks, especially on challenging datasets.
翻译:随着大型语言模型规模不断增大,其语言理解与生成能力持续增强,但针对下游任务进行全参数微调所需的显存与计算成本不容忽视。此外,传统微调通常需要大量任务标注数据,而数据采集成本在现实应用中同样是需要考虑的重要问题。本文聚焦于少样本自然语言生成场景下的参数高效微调方法,通过冻结大语言模型中的大部分参数并仅微调少量参数,在保持甚至提升模型性能的同时,显著降低显存占用、训练成本与标注成本。我们提出缩放提示调优方法,该方法在未显著增加训练成本的前提下,在性能和泛化能力上均优于传统提示调优。进一步对中间缩放提示调优的研究表明,该方法在少样本场景下具有卓越的迁移能力,为数据匮乏和计算资源受限场景提供解决方案。此外,通过对现有参数高效微调方法的全面比较,我们发现某些在先前研究中表现优异且训练成本适中的方法(如前缀调优)可能在少样本自然语言生成任务中表现不佳,尤其在具有挑战性的数据集上。