Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer learning. More importantly, when the available data size gets smaller, our method outperforms other strong baselines by a large margin. Based on our extensive empirical experiments across various datasets, we demonstrate that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
翻译:参数高效微调(PEFT)已被证明能够在仅更新少量参数的情况下,使预训练语言模型适配下游任务。尽管取得了成功,但现有方法大多独立适配每个任务,未考虑任务间的知识迁移,且受限于低数据场景。为解决此问题,我们提出基于原型的超适配器(PHA),这是一种基于适配器微调与超网络的新型框架。该框架引入了实例密度检索器与原型超网络,以样本高效的方式生成条件模块。在多任务学习与少样本迁移学习任务中,PHA相较于现有PEFT方法取得了可比的性能提升。更重要的是,当可用数据规模较小时,我们的方法大幅超越其他强基线模型。基于跨多个数据集的广泛实证实验,我们证明PHA在可训练参数、流任务准确率及样本效率之间实现了更优权衡。