The development of resource-constrained approaches to automatic speech recognition (ASR) is of great interest due to its broad applicability to many low-resource languages for which there is scant usable data. Existing approaches to many low-resource natural language processing tasks leverage additional data from higher-resource languages that are closely related to a target low-resource language. One increasingly popular approach uses task arithmetic to combine models trained on different tasks to create a model for a task where there is little to no training data. In this paper, we consider training on a particular language to be a task, and we generate task vectors by fine-tuning variants of the Whisper ASR system. For pairings of high- and low-resource languages, we merge task vectors via a linear combination, optimizing the weights of the linear combination on the downstream word error rate on the low-resource target language's validation set. We find that this approach consistently improves performance on the target languages.
翻译:自动语音识别(ASR)资源受限方法的开发因其对众多低资源语言的广泛适用性而备受关注,这些语言往往缺乏可用数据。现有处理低资源自然语言处理任务的方法通常利用与目标低资源语言密切相关的高资源语言的额外数据。一种日益流行的方法是使用任务算术,将针对不同任务训练的模型进行组合,以创建适用于训练数据极少甚至没有的任务的模型。本文将特定语言的训练视为一项任务,通过微调Whisper ASR系统的变体来生成任务向量。针对高资源与低资源语言配对,我们通过线性组合合并任务向量,并基于低资源目标语言验证集的下游词错误率优化线性组合的权重。实验结果表明,该方法能持续提升目标语言的性能。