Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation. Through systematic experimentation, we find that interference (or synergy) are primarily determined by model size, data size, and the proportion of each language pair within the total dataset. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data, and that using standard transformer configurations with less than one billion parameters largely alleviates interference and promotes synergy. Moreover, we show that tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference between low and high resource language pairs effectively, and can lead to superior performance overall.
翻译:多语言机器翻译模型能够从不同语言对之间的协同作用中获益,但也会遭受干扰。尽管旨在消除干扰的复杂方法日益增多,但我们对干扰这一现象的理解仍然有限。本研究识别了导致多语言机器翻译干扰的主要因素。通过系统性实验,我们发现干扰(或协同作用)主要取决于模型大小、数据规模以及各语言对在总数据集中的比例。我们观察到,显著的干扰主要发生在模型相对于可用训练数据非常小的情况下,而使用参数少于十亿的标准Transformer配置在很大程度上能缓解干扰并促进协同作用。此外,我们证明调整采样温度以控制数据中各语言对的比例是有效平衡低资源和高资源语言对之间干扰量的关键,并能带来整体上更优的性能。