Self-Powered LLM Modality Expansion for Large Speech-Text Models

Large language models (LLMs) exhibit remarkable performance across diverse tasks, indicating their potential for expansion into large speech-text models (LSMs) by integrating speech capabilities. Although unified speech-text pre-training and multimodal data instruction-tuning offer considerable benefits, these methods generally entail significant resource demands and tend to overfit specific tasks. This study aims to refine the use of speech datasets for LSM training by addressing the limitations of vanilla instruction tuning. We explore the instruction-following dynamics within LSMs, identifying a critical issue termed speech anchor bias-a tendency for LSMs to over-rely on speech inputs, mistakenly interpreting the entire speech modality as directives, thereby neglecting textual instructions. To counteract this bias, we introduce a self-powered LSM that leverages augmented automatic speech recognition data generated by the model itself for more effective instruction tuning. Our experiments across a range of speech-based tasks demonstrate that self-powered LSM mitigates speech anchor bias and improves the fusion of speech and text modalities in LSMs. Data, code and scripts are freely available at https://github.com/ytf-philp/Self-powered-LSM.

翻译：大语言模型（LLMs）在多样化任务中展现出卓越性能，表明其通过集成语音能力扩展为大规模语音-文本模型（LSMs）的潜力。尽管统一的语音-文本预训练与多模态数据指令微调能带来显著优势，这些方法通常需要大量资源且易对特定任务产生过拟合。本研究旨在通过改进普通指令微调的局限性，优化语音数据集在LSM训练中的使用。我们深入探究LSMs内部的指令跟随动态机制，发现一个关键问题——语音锚定偏差：即LSMs过度依赖语音输入，错误地将整个语音模态视为指令，从而忽略文本指令。为消除此偏差，我们提出一种自驱动LSM，利用模型自身生成的增强型自动语音识别数据进行更高效的指令微调。我们在系列语音任务上的实验表明，自驱动LSM能有效缓解语音锚定偏差，并提升LSMs中语音与文本模态的融合能力。相关数据、代码及脚本已开源：https://github.com/ytf-philp/Self-powered-LSM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/