Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code tuning, train only the language code; soft prompt tuning, train prepended tokens; and LoRA where a small set of additional parameters are optimised. Elastic Weight Consolidation (EWC) offers an alternative compromise with the potential to maintain performance in specific target languages. Results show that direct fine-tuning yields the best performance for the new language but degrades existing language capabilities. EWC can address this issue for specific languages. If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language.
翻译:ASR基础模型通常支持多种语言,例如Whisper支持100种语言。然而,关于在保持原始语言集性能的同时整合一种额外(通常是低资源)语言的研究仍然有限。微调方法虽然简单,但可能会降低原始语言集的识别准确率。我们比较了三种利用适配参数的方案:软语言代码调优(仅训练语言代码)、软提示调优(训练前置标记)以及LoRA(优化少量附加参数)。弹性权重固化(EWC)提供了另一种折中方案,有望在特定目标语言中维持性能。实验结果表明,直接微调能为新语言带来最佳性能,但会损害现有语言能力。EWC可针对特定语言缓解此问题。若仅使用适配参数,则能保持原有语言能力,但会以新语言性能下降为代价。