We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
翻译:本文提出Self-MoE方法,它将单一的大语言模型转化为一个由自专精专家组成的组合式模块化系统,称为MiXSE(自专精专家混合模型)。该方法利用自专精技术,通过自生成的合成数据构建专家模块,每个模块使共享的基础大语言模型具备不同的领域特定能力,并通过自优化的路由机制激活。这使得系统能够针对不同目标任务进行动态且能力特定的处理,从而提升整体性能,而无需大量人工标注数据和额外参数。我们的实验结果表明,对LLM进行专精化可能在非专精任务上表现出潜在的性能权衡。另一方面,我们的Self-MoE在知识、推理、数学和编程等多种基准测试中,相比基础LLM取得了显著提升(平均提升6.5个百分点)。同时,它在设计上通过语义专家和路由机制提供了更好的灵活性和可解释性,持续优于其他方法,包括实例合并与权重合并。我们的研究结果突显了模块化的关键作用,证明了Self-MoE适用于多种基础LLM,并展现了通过自我改进实现高效、可扩展和自适应系统的潜力。