Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

Recent advancements in large language models (LLMs) have revolutionized various domains, bringing significant progress and new opportunities. Despite progress in speech-related tasks, LLMs have not been sufficiently explored in multi-talker scenarios. In this work, we present a pioneering effort to investigate the capability of LLMs in transcribing speech in multi-talker environments, following versatile instructions related to multi-talker automatic speech recognition (ASR), target talker ASR, and ASR based on specific talker attributes such as sex, occurrence order, language, and keyword spoken. Our approach utilizes WavLM and Whisper encoder to extract multi-faceted speech representations that are sensitive to speaker characteristics and semantic context. These representations are then fed into an LLM fine-tuned using LoRA, enabling the capabilities for speech comprehension and transcription. Comprehensive experiments reveal the promising performance of our proposed system, MT-LLM, in cocktail party scenarios, highlighting the potential of LLM to handle speech-related tasks based on user instructions in such complex settings. The code, model, and samples are available at https://github.com/cuhealthybrains/MT-LLM.

翻译：近年来，大型语言模型（LLMs）的进展已彻底改变多个领域，带来了显著进步与新的机遇。尽管在语音相关任务中已取得进展，但LLMs在多人对话场景中的探索仍显不足。本研究率先探索了LLMs在多人对话环境中转录语音的能力，该模型可遵循与多人自动语音识别（ASR）、目标说话人ASR，以及基于特定说话人属性（如性别、出现顺序、语言及所说关键词）的ASR相关的多样化指令。我们的方法利用WavLM与Whisper编码器提取对说话人特征和语义语境敏感的多层面语音表征。这些表征随后输入到通过LoRA微调的LLM中，从而赋予模型语音理解与转录的能力。综合实验表明，我们提出的系统MT-LLM在鸡尾酒会场景中表现出优异的性能，凸显了LLM在此类复杂环境下基于用户指令处理语音相关任务的潜力。代码、模型及示例已公开于https://github.com/cuhealthybrains/MT-LLM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/