Multilingual machine translation (MMT), trained on a mixture of parallel and monolingual data, is key for improving translation in low-resource language pairs. However, the literature offers conflicting results on the performance of different methods. To resolve this, we examine how denoising autoencoding (DAE) and backtranslation (BT) impact MMT under different data conditions and model scales. Unlike prior studies, we use a realistic dataset of 100 directions and consider many domain combinations of monolingual and test data. We find that monolingual data generally helps MMT, but models are surprisingly brittle to domain mismatches, especially at smaller model scales. BT is beneficial when the parallel, monolingual, and test data sources are similar but can be detrimental otherwise, while DAE is less effective than previously reported. Next, we analyze the impact of scale (from 90M to 1.6B parameters) and find it is important for both methods, particularly DAE. As scale increases, DAE transitions from underperforming the parallel-only baseline at 90M to converging with BT performance at 1.6B, and even surpassing it in low-resource. These results offer new insights into how to best use monolingual data in MMT.
翻译:多语言机器翻译(MMT)通过混合使用平行语料和单语数据进行训练,是提升低资源语言对翻译质量的关键技术。然而,现有文献中关于不同方法性能的研究结果存在矛盾。为解决这一争议,我们系统研究了去噪自编码(DAE)和回译(BT)在不同数据条件和模型规模下对MMT的影响。与既往研究不同,我们采用包含100个翻译方向的实际数据集,并考虑了单语数据与测试数据的多种领域组合。研究发现:单语数据通常有助于MMT,但模型对领域不匹配表现出惊人的脆弱性,尤其在较小模型规模下;当平行语料、单语数据和测试数据来源相似时,BT能提升性能,反之则可能产生负面影响;DAE的实际效果低于此前文献报告的水平。进一步分析模型规模(从90M到1.6B参数)的影响发现,规模对两种方法均至关重要——特别是对DAE而言。随着模型规模增大,DAE从90M参数下性能低于纯平行基线,进展到1.6B参数时与BT性能趋同,甚至在低资源场景中超越BT。这些结果为如何最优利用MMT中的单语数据提供了新见解。