Scaling Laws for Multilingual Neural Machine Translation - 专知论文

会员服务 ·

0

缩放 · MoDELS · Machine Translation · Performer · Weight ·

2023 年 2 月 19 日

Scaling Laws for Multilingual Neural Machine Translation

翻译：多语言神经机器翻译的缩放定律

Patrick Fernandes,Behrooz Ghorbani,Xavier Garcia,Markus Freitag,Orhan Firat

from arxiv, 19 pages, 20 figures

In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior. We find that changing the weightings of the individual language pairs in the training mixture only affect the multiplicative factor of the scaling law. In particular, we observe that multilingual models trained using different mixing rates all exhibit the same scaling exponent. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, the direction of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, significantly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.

翻译：本文对多语言神经机器翻译模型的缩放特性进行了大规模实证研究。我们考察了模型规模增长对性能的影响，并探讨了训练混合成分对缩放行为的作用。研究发现，调整训练混合中不同语言对的权重仅影响缩放定律的乘性因子。特别地，我们观察到采用不同混合率训练的多语言模型均表现出相同的缩放指数。通过创新的联合缩放定律公式，我们计算了分配给每个语言对的有效参数数量，并检验了语言相似性对模型缩放行为的影响。实验表明语言相似性几乎没有产生任何影响，而多语言方向则发挥重要作用——从多种语言翻译至英语的模型相比逆方向模型，每项任务具有更多有效参数。最后，我们利用这些观测结果预测采用任意语言权重训练的、任意规模的多语言模型性能，显著减少了大型多语言模型中语言平衡所需的工作量。研究发现同时适用于领域内和领域外测试集，以及ChrF和BLEURT等多种评估指标。

0

相关内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

247+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

认知天波超视距雷达低可探测目标检测方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

控释VEGF/NT-3脊髓脱细胞支架在SCI模型中的血管化及神经再生研究

国家自然科学基金

0+阅读 · 2013年12月31日

青海湖高寒湿地生态系统CO2、水汽和热通量耦合及通量组分研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模机器学习问题的结构优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

机械动态装配可靠性设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

养殖海域浮游动物群落结构演变对环境变化的生物地球化学响应

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

原位诱导反应性星型胶质细胞增殖分化促进脊髓神经功能恢复的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于高分辨率遥感影像马尾松毛虫害的信息提取技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

Bayesian Optimization of Catalysts With In-context Learning

Arxiv

0+阅读 · 2023年4月11日

Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks

Arxiv

1+阅读 · 2023年4月11日

Incorporating Structured Sentences with Time-enhanced BERT for Fully-inductive Temporal Relation Prediction

Arxiv

0+阅读 · 2023年4月10日

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Arxiv

0+阅读 · 2023年4月10日

MiLMo:Minority Multilingual Pre-trained Language Model

Arxiv

1+阅读 · 2023年4月10日

VOICE: Visual Oracle for Interaction, Conversation, and Explanation

Arxiv

0+阅读 · 2023年4月8日

On the Pareto Front of Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年4月7日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

VIP会员

文章信息

相关主题

Machine Translation

最新内容

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

7+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

3+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

1+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

4+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

7+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

9+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

综述 | 遥感多模态大模型：领域专用还是通用模型？

综述 | 遥感多模态大模型：领域专用还是通用模型？

专知会员服务

5+阅读 · 7月25日

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

专知会员服务

5+阅读 · 7月25日

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

247+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美空军新型反无人机部队初探

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Bayesian Optimization of Catalysts With In-context Learning

Arxiv

0+阅读 · 2023年4月11日

Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks

Arxiv

1+阅读 · 2023年4月11日

Incorporating Structured Sentences with Time-enhanced BERT for Fully-inductive Temporal Relation Prediction

Arxiv

0+阅读 · 2023年4月10日

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Arxiv

0+阅读 · 2023年4月10日

MiLMo:Minority Multilingual Pre-trained Language Model

Arxiv

1+阅读 · 2023年4月10日

VOICE: Visual Oracle for Interaction, Conversation, and Explanation

Arxiv

0+阅读 · 2023年4月8日

On the Pareto Front of Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年4月7日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

认知天波超视距雷达低可探测目标检测方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

控释VEGF/NT-3脊髓脱细胞支架在SCI模型中的血管化及神经再生研究

国家自然科学基金

0+阅读 · 2013年12月31日

青海湖高寒湿地生态系统CO2、水汽和热通量耦合及通量组分研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模机器学习问题的结构优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

机械动态装配可靠性设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

养殖海域浮游动物群落结构演变对环境变化的生物地球化学响应

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

原位诱导反应性星型胶质细胞增殖分化促进脊髓神经功能恢复的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于高分辨率遥感影像马尾松毛虫害的信息提取技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员