Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline model with 30% computation and recover the performance of a larger model trained from scratch with over 50% reduction in computation. Furthermore, our analysis reveals that the introduced techniques help learn the new directions more effectively and alleviate catastrophic forgetting at the same time. We hope our work will guide research into more efficient approaches to growing languages for these MMT models and ultimately maximize the reuse of existing models.

翻译：随着多语言机器翻译（MMT）模型的规模不断扩大且支持的语言数量持续增加，合理利用并升级现有模型以节省计算资源成为自然需求——尤其当新语言的数据可用时。然而，新增语言需要更新词汇表，这使嵌入层的复用复杂化。如何在复用现有模型的同时进行架构调整以兼顾新旧语言的能力分配，这一问题尚未得到深入探讨。本文提出三种技术，能够加速新语言的高效学习，并有效缓解因词汇表与架构不匹配导致的灾难性遗忘。实验结果表明：通过（1）精心初始化网络、（2）应用学习率缩放、（3）执行数据上采样，我们仅需30%的计算量即可超越同等规模基线模型的性能；而恢复从头训练的更大规模模型性能时，计算量可降低50%以上。进一步分析表明，所提技术既能更高效地学习新语言方向，又能同步缓解灾难性遗忘。我们希望这项工作能为MMT模型的语言扩展探索更高效的方案，并最终最大化现有模型的复用价值。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/