Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer learning. More importantly, when the available data size gets smaller, our method outperforms other strong baselines by a large margin. Based on our extensive empirical experiments across various datasets, we demonstrate that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
翻译:参数高效微调在适配预训练语言模型至下游任务时仅更新少量参数,已展现出显著有效性。然而现有方法大多独立适配各任务,未考虑任务间的知识迁移,且难以应对低数据场景。为解决此问题,我们提出基于原型的超适配器——一个构建于适配器微调与超网络之上的新型框架。该框架引入实例稠密检索器与原型超网络,以样本高效方式生成条件化模块。在多任务学习与小样本迁移学习任务中,该方法性能提升与现有参数高效微调方法相当。更重要的是,当可用数据规模缩小时,我们的方法以显著优势超越其他强基线模型。基于跨多个数据集的广泛实验,我们证明PHA在可训练参数量、流任务准确率与样本效率之间取得了更优的权衡。