Mixture-of-Experts (MoE) architectures have shown strong multilingual capabilities, yet the internal mechanisms underlying performance gains and cross-language differences remain insufficiently understood. In this work, we conduct a systematic analysis of MoE models, examining routing behavior and expert specialization across languages and network depth. Our analysis reveals that multilingual processing in MoE models is highly structured: routing aligns with linguistic families, expert utilization follows a clear layerwise pattern, and high-resource languages rely on shared experts while low-resource languages depend more on language-exclusive experts despite weaker performance. Layerwise interventions further show that early and late MoE layers support language-specific processing, whereas middle layers serve as language-agnostic capacity hubs. Building on these insights, we propose a routing-guided steering method that adaptively guides routing behavior in middle layers toward shared experts associated with dominant languages at inference time, leading to consistent multilingual performance improvements, particularly for linguistically related language pairs. Our code is available at https://github.com/conctsai/Multilingualism-in-Mixture-of-Experts-LLMs.
翻译:混合专家(MoE)架构已展现出强大的多语言能力,但其性能提升与跨语言差异的内在机制仍未得到充分理解。本研究对MoE模型进行了系统性分析,考察了不同语言和网络深度下的路由行为与专家专业化模式。分析表明,MoE模型中的多语言处理具有高度结构化特征:路由行为与语系分布高度吻合,专家使用遵循清晰的层级模式,高资源语言依赖共享专家而低资源语言虽性能较弱却更倾向于使用语言专属专家。层级干预实验进一步揭示,早期与晚期MoE层支持语言特异性处理,而中间层则充当语言无关的能力枢纽。基于这些发现,我们提出一种路由引导的导向方法,在推理阶段自适应地将中间层路由行为导向与主导语言相关的共享专家,从而持续提升多语言性能,尤其在语言亲属关系较近的语言对中效果显著。代码已开源:https://github.com/conctsai/Multilingualism-in-Mixture-of-Experts-LLMs。