AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation

Ganesh Jawahar,Subhabrata Mukherjee,Xiaodong Liu,Young Jin Kim,Muhammad Abdul-Mageed,Laks V. S. Lakshmanan,Ahmed Hassan Awadallah,Sebastien Bubeck,Jianfeng Gao

from arxiv, ACL 2023 Findings

Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a homogeneous design where the same number of experts of the same size are placed uniformly throughout the network. Furthermore, existing MoE works do not consider computational constraints (e.g., FLOPs, latency) to guide their design. To this end, we develop AutoMoE -- a framework for designing heterogeneous MoE's under computational constraints. AutoMoE leverages Neural Architecture Search (NAS) to obtain efficient sparse MoE sub-transformers with 4x inference speedup (CPU) and FLOPs reduction over manually designed Transformers, with parity in BLEU score over dense Transformer and within 1 BLEU point of MoE SwitchTransformer, on aggregate over benchmark datasets for NMT. Heterogeneous search space with dense and sparsely activated Transformer modules (e.g., how many experts? where to place them? what should be their sizes?) allows for adaptive compute -- where different amounts of computations are used for different tokens in the input. Adaptivity comes naturally from routing decisions which send tokens to experts of different sizes. AutoMoE code, data, and trained models are available at https://aka.ms/AutoMoE.

翻译：摘要：混合专家模型在神经机器翻译任务中取得了最先进的性能。现有混合专家研究大多采用同构设计，即在网络中均匀部署数量相同、规模相同的专家。此外，现有混合专家工作未考虑计算约束（如浮点运算次数、延迟）来指导其设计。为此，我们提出AutoMoE——一个在计算约束下设计异构混合专家模型的框架。AutoMoE利用神经架构搜索来获取高效的稀疏混合专家子Transformer，相较人工设计的Transformer，在CPU上实现了4倍推理加速并降低了浮点运算次数，同时在基准数据集聚合结果上与密集Transformer的BLEU得分持平，与MoE SwitchTransformer的BLEU得分差距在1分以内。包含密集和稀疏激活Transformer模块的异构搜索空间（例如，专家数量？放置位置？专家规模？）允许实现自适应计算——即对输入中不同词元分配不同计算量。自适应性源于将词元发送至不同规模专家的路由决策。AutoMoE的代码、数据和训练模型已开源至 https://aka.ms/AutoMoE。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

专知会员服务

39+阅读 · 2020年11月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集