Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing but often suffers from poor zero-shot (ZS) translation qualities. While prior work has explored the causes of overall low ZS performance, our work introduces a fresh perspective: the presence of high variations in ZS performance. This suggests that MNMT does not uniformly exhibit poor ZS capability; instead, certain translation directions yield reasonable results. Through systematic experimentation involving 1,560 language directions spanning 40 languages, we identify three key factors contributing to high variations in ZS NMT performance: 1) target side translation capability 2) vocabulary overlap 3) linguistic properties. Our findings highlight that the target side translation quality is the most influential factor, with vocabulary overlap consistently impacting ZS performance. Additionally, linguistic properties, such as language family and writing system, play a role, particularly with smaller models. Furthermore, we suggest that the off-target issue is a symptom of inadequate ZS performance, emphasizing that zero-shot translation challenges extend beyond addressing the off-target problem. We release the data and models serving as a benchmark to study zero-shot for future research at https://github.com/Smu-Tan/ZS-NMT-Variations
翻译:多语言神经机器翻译(MNMT)有助于知识共享,但常因零样本翻译质量低下而受限。尽管先前研究已探索了整体低零样本性能的成因,本文提出全新视角:零样本性能存在高度变异。这表明MNMT并非在所有方向上均表现不佳,而是某些翻译方向能产生合理结果。通过涵盖40种语言的1560个语言方向的系统性实验,我们确定了导致零样本神经机器翻译性能高度变异的三个关键因素:1)目标端翻译能力;2)词汇重叠度;3)语言属性。研究结果表明,目标端翻译质量是最具影响力的因素,而词汇重叠度始终影响零样本性能。此外,语言属性(如语系和书写系统)在较小模型中作用显著。我们进一步指出脱靶问题是零样本性能不足的表征,强调零样本翻译的挑战远不止解决脱靶问题。相关数据与模型已发布于https://github.com/Smu-Tan/ZS-NMT-Variations,可作为未来零样本研究的基准。