Self-supervised learning (SSL) has greatly advanced speech representation learning, but multilingual SSL models remain constrained to languages encountered during pretraining. Retraining from scratch to incorporate new languages is computationally expensive, while sequential training without migitation strategies often leads to catastrophic forgetting. To address this, we propose MiLorE-SSL, a lightweight framework that combines LoRA modules with a soft mixture-of-experts (MoE) mechanism for efficient continual multilingual training. LoRA provides efficient low-rank adaptation, while soft MoE promotes flexible expert sharing across languages, reducing cross-lingual interference. To further mitigate forgetting, we introduce limited replay data from existing languages, avoiding reliance on large historical corpora. Experiments on ML-SUPERB demonstrate that MiLorE-SSL achieves strong performance in new languages and improves the ability in existing ones with only 2.14% trainable parameters.
翻译:自监督学习极大地推动了语音表征学习的发展,但现有的多语言自监督模型仍局限于预训练阶段所接触的语言。从头开始重新训练以纳入新语言的计算成本高昂,而未经缓解策略的顺序训练通常会导致灾难性遗忘。为解决这一问题,我们提出了MiLorE-SSL,这是一个轻量级框架,它将LoRA模块与软性专家混合机制相结合,以实现高效持续的多语言训练。LoRA提供高效的低秩适应,而软性MoE则促进了跨语言的灵活专家共享,减少了跨语言干扰。为进一步缓解遗忘,我们引入了来自现有语言的有限回放数据,避免了对大规模历史语料库的依赖。在ML-SUPERB上的实验表明,MiLorE-SSL在新语言上取得了强劲性能,并在仅使用2.14%可训练参数的情况下,提升了模型在现有语言上的能力。