Large pre-trained speech models are widely used as the de-facto paradigm, especially in scenarios when there is a limited amount of labeled data available. However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales. In this paper, we propose a novel approach called Two Parallel Adapter (TPA) that is inserted into the conformer-based model pre-trained model instead. TPA is based on systematic studies of the residual adapter, a popular approach for finetuning a subset of parameters. We evaluate TPA on various public benchmarks and experiment results demonstrates its superior performance, which is close to the full finetuning on different datasets and speech tasks. These results show that TPA is an effective and efficient approach for serving large pre-trained speech models. Ablation studies show that TPA can also be pruned, especially for lower blocks.
翻译:大型预训练语音模型被广泛用作事实上的标准范式,尤其在标注数据有限的场景中。然而,对自监督学习模型的所有参数进行微调计算成本高昂,且随着模型规模和下游任务数量的扩大变得不可行。本文提出一种名为双并行适配器(TPA)的新方法,该方法被插入基于Conformer的预训练模型中。TPA基于对残差适配器(一种流行的微调部分参数的方法)的系统研究。我们在多个公开基准上评估TPA,实验结果表明其性能优越,接近在不同数据集和语音任务上的全参数微调效果。这些结果证明,TPA是服务于大型预训练语音模型的一种有效且高效的方法。消融研究表明,TPA还可进行剪枝,尤其对较低层的模块效果更佳。