Despite their impressive performance, self-supervised speech models often struggle to generalize to new languages and tend to forget previously acquired knowledge during continual training. To address this, we propose Lamer-SSL, a parameter-efficient framework that integrates a Layer-Aware MixturE of LoRA Experts (Lamer) module with a replay strategy. The Lamer module enables flexible balancing between shared and language-specific representations, while layer-aware expert allocation assigns more experts to deeper layers where semantic information is richer. Meanwhile, the replay strategy retains prior knowledge using minimal data, mitigating forgetting during continual training. Experiments on automatic speech recognition (ASR) and language identification (LID) demonstrate that Lamer-SSL extends self-supervised models to new languages effectively while maintaining strong performance on previously learned languages with only 2.14% parameters being trainable.
翻译:尽管自监督语音模型性能卓越,但其在新语言上的泛化能力往往不足,且在持续训练过程中容易遗忘先前习得的知识。为解决这一问题,我们提出了Lamer-SSL,一种参数高效的框架,它将层感知LoRA专家混合模块与回放策略相结合。Lamer模块能够灵活平衡共享表示与语言特定表示,而层感知专家分配机制则将更多专家分配给语义信息更丰富的深层网络。同时,回放策略利用极少量的数据保留先验知识,有效缓解了持续训练过程中的遗忘问题。在自动语音识别与语言识别任务上的实验表明,Lamer-SSL能够高效地将自监督模型扩展至新语言,同时仅需训练2.14%的参数即可在已学习语言上保持强劲性能。