Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact.
翻译:自监督(SSL)模型在各种下游任务中展现出卓越性能。然而,这些模型通常仅针对有限语言开发,在实际应用中可能面临新语言场景。为每种新语言单独开发SSL模型成本高昂,因此如何在不损害模型原有能力的前提下,高效地将现有SSL模型适配至新语言至关重要。本研究提出基于LoRA的适配方法,将其集成至现有SSL模型以实现语言扩展。同时开发了包含数据融合与重聚类技术的保持策略,以维持模型在已有语言上的能力。将方法应用于mHuBERT模型后,我们在语音重合成任务中验证了其有效性。实验表明:适配方法使mHuBERT能够成功应用于新语言(普通话),其MOS值提升约1.6,WER相对值最高降低61.72%;保持策略则确保模型在已有语言和新语言上的性能均得到完整保留。