Multilingual speech recognition for both monolingual and code-switching speech is a challenging task. Recently, based on the Mixture of Experts (MoE), many works have made good progress in multilingual and code-switching ASR, but present huge computational complexity with the increase of supported languages. In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE extracts language-specific representations through the Mixture of Language Experts (MLE), which is guided to learn by a frame-wise language routing mechanism. The weight-shared frame-level language identification (LID) network is jointly trained as the shared pre-router of each MoE layer. Experiments show that the proposed method significantly improves multilingual and code-switching speech recognition performances over baseline with comparable computational efficiency.
翻译:多语言及语码转换语音的识别是一项具有挑战性的任务。近年来,基于专家混合(MoE)架构的诸多研究在 multilingual 和 code-switching 自动语音识别(ASR)领域取得了显著进展,但随着支持语言数量的增加,计算复杂度急剧上升。为此,本文提出一种名为“语言路由专家混合”(Language-Routing Mixture of Experts, LR-MoE)的高效计算网络,专用于多语言与语码转换 ASR 任务。LR-MoE 通过“语言专家混合”(Mixture of Language Experts, MLE)提取语言特定表征,并由帧级语言路由机制引导其学习。权重共享的帧级语言识别(LID)网络作为各 MoE 层的共享预路由器进行联合训练。实验表明,所提方法在保持相当计算效率的同时,显著提升了多语言与语码转换语音识别的性能,优于基线系统。