Multilingual machine translation (MMT) benefits from cross-lingual transfer but is a challenging multitask optimization problem. This is partly because there is no clear framework to systematically learn language-specific parameters. Self-supervised learning (SSL) approaches that leverage large quantities of monolingual data (where parallel data is unavailable) have shown promise by improving translation performance as complementary tasks to the MMT task. However, jointly optimizing SSL and MMT tasks is even more challenging. In this work, we first investigate how to utilize intra-distillation to learn more *language-specific* parameters and then show the importance of these language-specific parameters. Next, we propose a novel but simple SSL task, concurrent denoising, that co-trains with the MMT task by concurrently denoising monolingual data on both the encoder and decoder. Finally, we apply intra-distillation to this co-training approach. Combining these two approaches significantly improves MMT performance, outperforming three state-of-the-art SSL methods by a large margin, e.g., 11.3\% and 3.7\% improvement on an 8-language and a 15-language benchmark compared with MASS, respectively
翻译:多语言机器翻译(MMT)受益于跨语言迁移,但这是一个具有挑战性的多任务优化问题。部分原因在于缺乏系统学习语言特定参数的清晰框架。利用大量单语数据(在无平行语料情况下)的自监督学习(SSL)方法,通过作为MMT任务的补充任务提升翻译性能,已展现出潜力。然而,联合优化SSL与MMT任务更具挑战性。本文首先研究如何利用内部蒸馏学习更多*语言特定*参数,并展示这些参数的重要性;其次提出一种新颖而简单的SSL任务——共时去噪,通过在编码器和解码器上同时对单语数据进行去噪,与MMT任务协同训练;最后将该协同训练方案与内部蒸馏相结合。两种方法的联合显著提升了MMT性能,在8语言和15语言基准测试中分别比MASS等三种最先进的SSL方法提升11.3%和3.7%的翻译效果。