Pre-trained multilingual speech foundation models, like Whisper, have shown impressive performance across different languages. However, adapting these models to new or specific languages is computationally extensive and faces catastrophic forgetting problems. Addressing these issues, our study investigates strategies to enhance the model on new languages in the absence of original training data, while also preserving the established performance on the original languages. Specifically, we first compare various LoRA-based methods to find out their vulnerability to forgetting. To mitigate this issue, we propose to leverage the LoRA parameters from the original model for approximate orthogonal gradient descent on the new samples. Additionally, we also introduce a learnable rank coefficient to allocate trainable parameters for more efficient training. Our experiments with a Chinese Whisper model (for Uyghur and Tibetan) yield better results with a more compact parameter set.
翻译:预训练的多语言语音基础模型(如Whisper)已在不同语言中展现出卓越性能。然而,将这些模型适配至新语言或特定语言时,通常需要大量计算资源,并面临灾难性遗忘问题。针对这些挑战,本研究探索了在缺乏原始训练数据的情况下增强模型对新语言适应能力的方法,同时保持其在原始语言上的已有性能。具体而言,我们首先比较了多种基于LoRA的方法,以探究其对遗忘的敏感性。为缓解此问题,我们提出利用原始模型的LoRA参数,在新样本上执行近似正交梯度下降。此外,我们还引入可学习的秩系数,以更高效地分配可训练参数。在针对中文Whisper模型(适配维吾尔语和藏语)的实验中,该方法以更紧凑的参数集取得了更优的结果。