Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge developments in MoE research, we have established a resource repository accessible at https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts.
翻译:大型语言模型(LLM)在从自然语言处理到计算机视觉等众多领域取得了前所未有的进展。LLM的强大能力源于其庞大的模型规模、广泛多样的数据集以及训练过程中调用的海量计算资源,这些因素共同促成了小型模型所不具备的涌现能力(例如上下文学习)。在此背景下,专家混合模型(MoE)作为一种能以最小计算开销大幅扩展模型容量的有效方法,已引起学术界和工业界的广泛关注。尽管其应用日益普及,目前仍缺乏对MoE文献系统而全面的综述。本调查旨在填补这一空白,为深入研究MoE复杂机制的研究人员提供重要资源。我们首先简要介绍MoE层的结构,继而提出一种新的MoE分类体系。接着,我们从算法和系统两个维度综述各类MoE模型的核心设计,同时汇总了可用的开源实现、超参数配置与实证评估结果。此外,我们阐述了MoE在实践中的多维应用场景,并展望了未来研究的若干潜在方向。为促进MoE研究的前沿进展持续更新与资源共享,我们建立了可通过 https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts 访问的资源库。