MoEITS: A Green AI approach for simplifying MoE-LLMs

Large language models are transforming all areas of academia and industry, attracting the attention of researchers, professionals, and the general public. In the trek for more powerful architectures, Mixture-of-Experts, inspired by ensemble models, have emerged as one of the most effective ways to follow. However, this implies a high computational burden for both training and inference. To reduce the impact on computing and memory footprint as well as the energy consumption, simplification methods has arisen as very effective procedures. In this paper, an original algorithm, MoEITS, for MoE-LLMs simplification is presented. The algorithm is characterized by a refined simplicity, underpinned by standardized Information Theoretic frameworks. MoEITS is analyzed in depth from theoretical and practical points of view. Its computational complexity is studied. Its performance on the accuracy of the simplified LLMs and the reduction rate achieved is assessed through a thoroughly designed experimentation. This empirical evaluation includes a comparison with state-of-the-art MoE-LLM pruning methods applied on Mixtral $8\times7$B, Qwen1.5-2.7B, and DeepSeek-V2-Lite. The extensive experimentation conducted demonstrates that MoEITS outperforms state-of-the-art techniques by generating models that are both effective across all benchmarks and computationally efficient. The code implementing the method will be available at https://github.com/luisbalru/MoEITS.

翻译：大语言模型正改变着学术界和工业界的各个领域，吸引了研究人员、专业人士及公众的广泛关注。在追求更强大架构的过程中，受集成模型启发的混合专家模型已成为最有效的演进路径之一。然而，这给训练和推理带来了高昂的计算负担。为降低计算资源占用、内存开销及能耗，简化方法已成为非常有效的手段。本文提出了一种用于简化MoE-LLMs的原创算法MoEITS，该算法以标准化信息论框架为基础，具有简洁而精炼的特性。我们从理论和实践两个层面深入分析了MoEITS，研究了其计算复杂度，并通过精心设计的实验评估了其在简化LLM的准确性和压缩率方面的性能。该实证评估包括与当前最先进的MoE-LLM剪枝方法（应用于Mixtral $8\times7$B、Qwen1.5-2.7B和DeepSeek-V2-Lite）的对比。大量实验表明，MoEITS能够生成在所有基准测试中均有效且计算高效的模型，从而超越了现有最先进技术。实现该方法的代码将在https://github.com/luisbalru/MoEITS上提供。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

18+阅读 · 5月27日

LLMs与生成式智能体模拟：复杂系统研究的新范式

专知会员服务

28+阅读 · 2025年6月15日

【新书】设计大型语言模型应用：一种面向LLMs的整体方法

专知会员服务

56+阅读 · 2025年3月16日

【NeurIPS2024】《AmoebaLLM：构建任意形状的大型语言模型以实现高效和即时部署》

专知会员服务

22+阅读 · 2024年11月21日