Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive models entails an enormous amount of time and cost, along with a huge carbon footprint. To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain. We also propose an integrated parameter-efficient tuning (IPET) framework by aggregating the embedding prompt (a prompt-based learning approach), and the adapter (an effective transfer learning method). We demonstrate the efficacy of the proposed framework using two backbone pre-trained audio models with different characteristics: the audio spectrogram transformer and wav2vec 2.0. The proposed IPET framework exhibits remarkable performance compared to fine-tuning method with fewer trainable parameters in four downstream tasks: sound event classification, music genre classification, keyword spotting, and speaker verification. Furthermore, the authors identify and analyze the shortcomings of the IPET framework, providing lessons and research directions for parameter efficient tuning in the audio domain.

翻译：超大规模通用预训练模型的出现正改变着为目标任务构建专用模型的范式。在音频研究领域，具有高迁移性和适应性的任务无关预训练模型通过对下游任务的微调，已取得当前最优性能。然而，重新训练这些庞大模型的所有参数需要耗费大量时间和成本，同时产生巨大的碳足迹。为克服这些限制，本研究探索并应用了音频领域的高效迁移学习方法。我们进一步提出一种集成参数高效调优（IPET）框架，通过聚合嵌入提示（一种基于提示的学习方法）和适配器（一种有效的迁移学习方法）实现。我们采用两种具有不同特性的骨干预训练音频模型（音频频谱图变换器与wav2vec 2.0）验证了所提框架的有效性。在声音事件分类、音乐流派分类、关键词识别和说话人验证四项下游任务中，所提出的IPET框架在可训练参数更少的情况下，展现出与微调方法相比更优越的性能。此外，作者识别并分析了IPET框架的局限性，为音频领域的参数高效调优提供了经验教训与研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日