Large language models are transforming all areas of academia and industry, attracting the attention of researchers, professionals, and the general public. In the trek for more powerful architectures, Mixture-of-Experts, inspired by ensemble models, have emerged as one of the most effective ways to follow. However, this implies a high computational burden for both training and inference. To reduce the impact on computing and memory footprint as well as the energy consumption, simplification methods has arisen as very effective procedures. In this paper, an original algorithm, MoEITS, for MoE-LLMs simplification is presented. The algorithm is characterized by a refined simplicity, underpinned by standardized Information Theoretic frameworks. MoEITS is analyzed in depth from theoretical and practical points of view. Its computational complexity is studied. Its performance on the accuracy of the simplified LLMs and the reduction rate achieved is assessed through a thoroughly designed experimentation. This empirical evaluation includes a comparison with state-of-the-art MoE-LLM pruning methods applied on Mixtral $8\times7$B, Qwen1.5-2.7B, and DeepSeek-V2-Lite. The extensive experimentation conducted demonstrates that MoEITS outperforms state-of-the-art techniques by generating models that are both effective across all benchmarks and computationally efficient. The code implementing the method will be available at https://github.com/luisbalru/MoEITS.
翻译:大语言模型正改变着学术界和工业界的各个领域,吸引了研究人员、专业人士及公众的广泛关注。在追求更强大架构的过程中,受集成模型启发的混合专家模型已成为最有效的演进路径之一。然而,这给训练和推理带来了高昂的计算负担。为降低计算资源占用、内存开销及能耗,简化方法已成为非常有效的手段。本文提出了一种用于简化MoE-LLMs的原创算法MoEITS,该算法以标准化信息论框架为基础,具有简洁而精炼的特性。我们从理论和实践两个层面深入分析了MoEITS,研究了其计算复杂度,并通过精心设计的实验评估了其在简化LLM的准确性和压缩率方面的性能。该实证评估包括与当前最先进的MoE-LLM剪枝方法(应用于Mixtral $8\times7$B、Qwen1.5-2.7B和DeepSeek-V2-Lite)的对比。大量实验表明,MoEITS能够生成在所有基准测试中均有效且计算高效的模型,从而超越了现有最先进技术。实现该方法的代码将在https://github.com/luisbalru/MoEITS上提供。