Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation

Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2$\times$ speedup over the conventional multi-way training method.\footnote{ \url{https://github.com/CONE-MT/Lego-MT}.}

翻译：多语言神经机器翻译旨在构建一个统一模型来支持多种语言方向。现有用于多语言神经机器翻译的整体式模型面临两大挑战：语言间参数干扰以及大型模型的低效推理。本文重新审视经典的多分支结构，通过为每种语言（或语言组）分配独立分支，开发了一种支持即插即用训练与推理的可分离模型。为满足在统一空间中学习所有语言表示的需求，我们提出了一种新颖的高效训练方案，并在此基础上构建了有效的可分离模型Lego-MT。为进行公平比较，我们从OPUS收集数据并构建了一个覆盖433种语言、包含13亿平行语料的翻译基准。实验表明，参数量为12亿的Lego-MT平均提升3.2个spBLEU分数，甚至超越了参数量为120亿的M2M-100模型。所提出的训练方案相比传统多分支训练方法实现了28.2倍的加速。\footnote{\url{https://github.com/CONE-MT/Lego-MT}.}

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日