Multilingual speech recognition for both monolingual and code-switching speech is a challenging task. Recently, based on the Mixture of Experts (MoE), many works have made good progress in multilingual and code-switching ASR, but present huge computational complexity with the increase of supported languages. In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE extracts language-specific representations through the Mixture of Language Experts (MLE), which is guided to learn by a frame-wise language routing mechanism. The weight-shared frame-level language identification (LID) network is jointly trained as the shared pre-router of each MoE layer. Experiments show that the proposed method significantly improves multilingual and code-switching speech recognition performances over baseline with comparable computational efficiency.
翻译:多语语音识别(包括单语和语码转换语音)是一项具有挑战性的任务。近年来,基于专家混合(MoE)架构的方法在多语及语码转换自动语音识别(ASR)领域取得了显著进展,但随着支持语言数量的增加,其计算复杂度急剧上升。本文提出一种名为语言路由专家混合(LR-MoE)的高效计算网络,用于多语及语码转换ASR。LR-MoE通过语言专家混合(MLE)模块提取语言特定表征,并采用帧级语言路由机制引导该模块的学习。权重共享的帧级语言识别(LID)网络作为各MoE层的共享预路由器进行联合训练。实验结果表明,所提方法在保持相当计算效率的同时,显著提升了多语及语码转换语音识别的性能。