Multilingual pretraining and fine-tuning have remarkably succeeded in various natural language processing tasks. Transferring representations from one language to another is especially crucial for cross-lingual learning. One can expect machine translation objectives to be well suited to fostering such capabilities, as they involve the explicit alignment of semantically equivalent sentences from different languages. This paper investigates the potential benefits of employing machine translation as a continued training objective to enhance language representation learning, bridging multilingual pretraining and cross-lingual applications. We study this question through two lenses: a quantitative evaluation of the performance of existing models and an analysis of their latent representations. Our results show that, contrary to expectations, machine translation as the continued training fails to enhance cross-lingual representation learning in multiple cross-lingual natural language understanding tasks. We conclude that explicit sentence-level alignment in the cross-lingual scenario is detrimental to cross-lingual transfer pretraining, which has important implications for future cross-lingual transfer studies. We furthermore provide evidence through similarity measures and investigation of parameters that this lack of positive influence is due to output separability -- which we argue is of use for machine translation but detrimental elsewhere.
翻译:多语预训练与微调已在多种自然语言处理任务中取得显著成功。将表示从一种语言迁移到另一种语言对于跨语言学习尤为关键。由于机器翻译目标涉及不同语言中语义等价句子的显式对齐,因此有理由预期其能有效培养这种能力。本文探究了将机器翻译作为持续训练目标以增强语言表示学习、桥接多语预训练与跨语言应用的潜在益处。我们从两个维度展开研究:对现有模型性能的量化评估,以及对其潜在表示的分析。结果表明,与预期相反,将机器翻译作为持续训练目标并未能在多个跨语言自然语言理解任务中增强跨语言表示学习。我们得出结论:在跨语言场景中,显式的句子级对齐对跨语言迁移预训练有害,这一发现对未来跨语言迁移研究具有重要启示。此外,通过相似度度量与参数分析,我们提供的证据表明:这种积极影响的缺失源于输出可分离性——我们论证这种特性在机器翻译中具有价值,但在其他场景中却适得其反。