Recent studies have demonstrated promising outcomes by employing large language models with multi-tasking capabilities. They utilize prompts to guide the model's behavior and surpass performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly perform various spoken language understanding (SLU) tasks? To address this, we utilize pre-trained automatic speech recognition (ASR) models and employ various task and dataset specifiers as discrete prompts. We demonstrate efficacy of our single multi-task learning (MTL) model "UniverSLU" for 12 different speech classification and sequence generation tasks across 17 datasets and 9 languages. Results show that UniverSLU achieves competitive performance and even surpasses task-specific models. We also conduct preliminary investigations into enabling human-interpretable natural phrases instead of task specifiers as discrete prompts and test the model's generalization capabilities to new paraphrases.
翻译:近期研究表明,通过利用具备多任务能力的大语言模型已取得令人瞩目的成果。这些模型采用提示机制引导其行为,并超越了专用任务模型的性能。受此启发,我们提出疑问:是否能构建一个联合执行多种口语理解任务的单一模型?为解决这一课题,我们利用预训练的自动语音识别模型,并采用各类任务和数据集标识符作为离散提示。我们论证了单一多任务学习模型"UniverSLU"在涉及17个数据集和9种语言的12项不同语音分类与序列生成任务中的有效性。结果表明,UniverSLU不仅取得了具有竞争力的性能,甚至超越了专用任务模型。我们还开展了初步研究,探索使用可解释的自然短语替代任务标识符作为离散提示,并测试了模型对新释义的泛化能力。