Task Arithmetic for Language Expansion in Speech Translation

Recent advances in large language models (LLMs) have gained interest in speech-text multimodal foundation models, achieving strong performance on instruction-based speech translation (ST). However, expanding language pairs from an existing instruction-tuned ST system is costly due to the necessity of re-training on a combination of new and previous datasets. We propose to expand new language pairs by merging the model trained on new language pairs and the existing model, using task arithmetic. We find that the direct application of task arithmetic for ST causes the merged model to fail to follow instructions; thus, generating translation in incorrect languages. To eliminate language confusion, we propose an augmented task arithmetic method that merges an additional language control model. It is trained to generate the correct target language token following the instructions. Our experiments demonstrate that our proposed language control model can achieve language expansion by eliminating language confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66 and 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the use of our task arithmetic framework can expand to a language pair where neither paired ST training data nor a pre-trained ST model is available. We first synthesize the ST system from machine translation (MT) systems via task analogy, then merge the synthesized ST system to the existing ST model.

翻译：近年来，大型语言模型（LLMs）的进展引发了人们对语音-文本多模态基础模型的兴趣，这类模型在基于指令的语音翻译（ST）任务上取得了优异性能。然而，由于需要在新旧数据集组合上重新训练，扩展现有指令调优ST系统的语言对成本高昂。我们提出通过任务算术方法，将训练于新语言对的模型与现有模型进行融合，从而实现新语言对的扩展。我们发现，直接将任务算术应用于ST会导致融合模型无法遵循指令，从而生成错误语言的翻译结果。为消除语言混淆，我们提出一种增强型任务算术方法，该方法额外融合了一个语言控制模型。该模型经过训练，能够根据指令生成正确的目标语言标记。实验表明，我们提出的语言控制模型能够通过消除语言混淆实现语言扩展。在MuST-C和CoVoST-2数据集上的实验显示，该方法分别实现了最高4.66和4.92 BLEU分数的提升。此外，我们证明了该任务算术框架可扩展至既无配对ST训练数据、也无预训练ST模型可用的语言对。我们首先通过任务类比从机器翻译（MT）系统合成ST系统，随后将合成的ST系统与现有ST模型进行融合。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/