The Mixture of Experts (MoE) approach is ideally suited for tackling multilingual and code-switching (CS) challenges due to its multi-expert architecture. This work introduces the DLG-MoE, which is optimized for bilingual and CS scenarios. Our novel Dynamic Language Group-based MoE layer features a language router with shared weights for explicit language modeling, while independent unsupervised routers within the language group handle attributes beyond language. This structure not only enhances expert extension capabilities but also supports dynamic top-k training, allowing for flexible inference across various top-k values and improving overall performance. The model requires no pre-training and supports streaming recognition, achieving state-of-the-art (SOTA) results with unmatched flexibility compared to other methods. The Code will be released.
翻译:混合专家(MoE)方法因其多专家架构,特别适用于处理多语言及语码转换(CS)的挑战。本研究提出了DLG-MoE,该模型针对双语及CS场景进行了优化。我们新颖的基于动态语言组的MoE层采用了一个具有共享权重的语言路由器以进行显式语言建模,而语言组内独立的无监督路由器则处理语言之外的其他属性。这种结构不仅增强了专家的扩展能力,还支持动态top-k训练,允许在不同top-k值下进行灵活推理,从而提升了整体性能。该模型无需预训练,并支持流式识别,与其他方法相比,在实现最先进(SOTA)结果的同时提供了无与伦比的灵活性。代码将予以公开。