Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

Language diversity presents a significant challenge in speech-to-text (S2T) tasks, such as automatic speech recognition and translation. Traditional multi-task training approaches aim to address this by jointly optimizing multiple speech recognition and translation tasks across various languages. While models like Whisper, built on these strategies, demonstrate strong performance, they still face issues of high computational cost, language interference, suboptimal training configurations, and limited extensibility. To overcome these challenges, we introduce LoRS-Merging (low-rank and sparse model merging), a novel technique designed to efficiently integrate models trained on different languages or tasks while preserving performance and reducing computational overhead. LoRS-Merging combines low-rank and sparse pruning to retain essential structures while eliminating redundant parameters, mitigating language and task interference, and enhancing extensibility. Experimental results across a range of languages demonstrate that LoRS-Merging reduces the word error rate by 10% and improves BLEU scores by 4% compared to conventional multi-lingual multi-task training baselines. Our findings suggest that model merging, particularly LoRS-Merging, is a scalable and effective complement to traditional multi-lingual training strategies for S2T applications.

翻译：语言多样性对语音到文本任务（如自动语音识别与翻译）构成了重大挑战。传统的多任务训练方法旨在通过联合优化跨多种语言的多个语音识别与翻译任务来解决这一问题。尽管基于这些策略构建的模型（如Whisper）表现出强大的性能，但仍面临计算成本高、语言干扰、训练配置欠佳以及可扩展性有限等问题。为克服这些挑战，我们提出了LoRS-Merging（低秩与稀疏模型融合）技术，这是一种旨在高效整合针对不同语言或任务训练的模型的新方法，同时保持性能并降低计算开销。LoRS-Merging结合了低秩近似与稀疏剪枝，以保留关键结构、消除冗余参数、减轻语言与任务干扰并增强可扩展性。跨多种语言的实验结果表明，相较于传统的多语言多任务训练基线方法，LoRS-Merging将词错误率降低了10%，并将BLEU分数提升了4%。我们的研究结果表明，模型融合（特别是LoRS-Merging）是语音到文本应用中传统多语言训练策略的一种可扩展且有效的补充。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日