Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models. Parameter inefficiency can however arise when, during transfer learning, all the parameters of a large pre-trained model need to be updated for individual downstream tasks. As the number of parameters grows, fine-tuning is prone to overfitting and catastrophic forgetting. In addition, full fine-tuning can become prohibitively expensive when the model is used for many tasks. To mitigate this issue, parameter-efficient transfer learning algorithms, such as adapters and prefix tuning, have been proposed as a way to introduce a few trainable parameters that can be plugged into large pre-trained language models such as BERT, and HuBERT. In this paper, we introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks. Additionally, we introduce a new adapter, ConvAdapter, based on 1D convolution. We show that ConvAdapter outperforms the standard adapters while showing comparable performance against prefix tuning and LoRA with only 0.94% of trainable parameters on some of the task in SURE. We further explore the effectiveness of parameter efficient transfer learning for speech synthesis task such as Text-to-Speech (TTS).
翻译:微调被广泛用作从预训练模型进行迁移学习的默认算法。然而,在迁移学习过程中,当需要为每个下游任务更新大型预训练模型的所有参数时,会出现参数效率低下的问题。随着参数数量的增加,微调容易出现过拟合和灾难性遗忘。此外,当模型用于多个任务时,完全微调的成本可能变得过高。为缓解这一问题,参数高效的迁移学习算法(如适配器和前缀微调)被提出,它们作为一种引入少量可训练参数的方式,可嵌入到大型预训练语言模型(如BERT和HuBERT)中。本文引入了语音理解评估(SURE)基准测试,用于评估各种语音处理任务中的参数高效学习。此外,我们提出了一种基于一维卷积的新适配器——ConvAdapter。研究表明,ConvAdapter在SURE的某些任务上仅使用0.94%的可训练参数,不仅优于标准适配器,而且性能与前缀微调和LoRA相当。我们进一步探讨了参数高效迁移学习在语音合成任务(如文本到语音TTS)中的有效性。