Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation. Through systematic experimentation, we find that interference (or synergy) are primarily determined by model size, data size, and the proportion of each language pair within the total dataset. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data, and that using standard transformer configurations with less than one billion parameters largely alleviates interference and promotes synergy. Moreover, we show that tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference between low and high resource language pairs effectively, and can lead to superior performance overall.
翻译:多语言机器翻译模型能够从不同语言对之间的协同效应中获益,但也可能遭受干扰。尽管目前涌现出越来越多旨在消除干扰的复杂方法,但我们对干扰这一现象的理解仍然有限。本研究识别了导致多语言机器翻译干扰的主要因素。通过系统性实验,我们发现干扰(或协同效应)主要由模型大小、数据规模以及各语言对在总数据集中的比例决定。我们观察到,显著干扰主要发生在模型相对于可用训练数据非常小的情况下,而使用参数少于十亿的标准Transformer配置可大幅减轻干扰并促进协同效应。此外,我们证明调整采样温度以控制各语言对在数据中的比例,对于有效平衡低资源与高资源语言对之间的干扰程度至关重要,并能带来整体更优的性能。