UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly performs various spoken language understanding (SLU) tasks? We start by adapting a pre-trained automatic speech recognition model to additional tasks using single-token task specifiers. We enhance this approach through instruction tuning, i.e., finetuning by describing the task using natural language instructions followed by the list of label options. Our approach can generalize to new task descriptions for the seen tasks during inference, thereby enhancing its user-friendliness. We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages. On most tasks, UniverSLU achieves competitive performance and often even surpasses task-specific models. Additionally, we assess the zero-shot capabilities, finding that the model generalizes to new datasets and languages for seen task types.

翻译：近期研究借助具备多任务能力的大语言模型，通过自然语言提示引导模型行为，其性能超越了特定任务模型。受此启发，我们提出疑问：能否构建一个联合执行多种语音理解任务的统一模型？首先，我们采用单标记任务标识符，将预训练的自动语音识别模型适配至额外任务。通过指令微调（即使用自然语言指令描述任务并附加标签选项列表进行微调）进一步优化该方法。我们的方法可在推理阶段泛化至已见任务的新任务描述，从而提升用户友好性。我们证明了单一多任务学习模型"UniverSLU"在涵盖17个数据集与9种语言的12种语音分类及序列生成任务类型上的有效性。在多数任务中，UniverSLU达到了具有竞争力的性能，甚至常超越特定任务模型。此外，我们评估了零样本能力，发现该模型能够在已见任务类型上泛化至新数据集与语言。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/