Comet：面向专家混合模型的细粒度计算-通信重叠 (Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts)

Mixture-of-experts (MoE) has been extensively employed to scale large language models to trillion-plus parameters while maintaining a fixed computational cost. The development of large MoE models in the distributed scenario encounters the problem of large communication overhead. The inter-device communication of a MoE layer can occupy 47% time of the entire model execution with popular models and frameworks. Therefore, existing methods suggest the communication in a MoE layer to be pipelined with the computation for overlapping. However, these coarse grained overlapping schemes introduce a notable impairment of computational efficiency and the latency concealing is sub-optimal. To this end, we present COMET, an optimized MoE system with fine-grained communication-computation overlapping. Leveraging data dependency analysis and task rescheduling, COMET achieves precise fine-grained overlapping of communication and computation. Through adaptive workload assignment, COMET effectively eliminates fine-grained communication bottlenecks and enhances its adaptability across various scenarios. Our evaluation shows that COMET accelerates the execution of a single MoE layer by $1.96\times$ and for end-to-end execution, COMET delivers a $1.71\times$ speedup on average. COMET has been adopted in the production environment of clusters with ten-thousand-scale of GPUs, achieving savings of millions of GPU hours.

翻译：专家混合模型已被广泛用于将大型语言模型扩展至万亿以上参数，同时保持固定的计算成本。在分布式场景中开发大型专家混合模型面临通信开销巨大的问题。在主流模型与框架下，专家混合模型层的设备间通信可占据整个模型执行时间的47%。因此，现有方法建议将专家混合模型层的通信与计算进行流水线化以实现重叠。然而，这些粗粒度的重叠方案会显著损害计算效率，且延迟隐藏效果欠佳。为此，我们提出COMET——一种具有细粒度通信-计算重叠的优化专家混合模型系统。通过数据依赖性分析与任务重调度，COMET实现了通信与计算的精确细粒度重叠。借助自适应工作负载分配，COMET有效消除了细粒度通信瓶颈，并增强了其在多种场景下的适应性。评估结果表明，COMET将单层专家混合模型的执行速度提升至$1.96\times$；在端到端执行中，COMET平均实现$1.71\times$的加速比。COMET已在万卡规模GPU集群的生产环境中部署，累计节省数百万GPU时。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日