Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained language model (LM) to directly generate the output for downstream tasks. Recently, prompt tuning has demonstrated its storage and computation efficiency in both natural language processing (NLP) and speech processing fields. These advantages have also revealed prompt tuning as a candidate approach to serving pre-trained LM for multiple tasks in a unified manner. For speech processing, SpeechPrompt shows its high parameter efficiency and competitive performance on a few speech classification tasks. However, whether SpeechPrompt is capable of serving a large number of tasks is unanswered. In this work, we propose SpeechPrompt v2, a prompt tuning framework capable of performing a wide variety of speech classification tasks, covering multiple languages and prosody-related tasks. The experiment result shows that SpeechPrompt v2 achieves performance on par with prior works with less than 0.15M trainable parameters in a unified framework.
翻译:提示调优是一种通过调整少量参数来引导预训练语言模型直接生成下游任务输出的技术。近年来,提示调优在自然语言处理与语音处理领域均展现出其存储与计算效率优势。这些优势也表明提示调优可作为统一服务预训练语言模型处理多任务的候选方法。在语音处理方面,SpeechPrompt已在若干语音分类任务中展现出高参数效率与竞争性表现。然而,SpeechPrompt能否胜任大量任务的问题仍有待解答。本研究提出SpeechPrompt v2,这是一个能够执行涵盖多语言及韵律相关任务的广泛语音分类任务的提示调优框架。实验结果表明,SpeechPrompt v2在统一框架中使用少于0.15M可训练参数即可达到与先前工作相当的性能。