Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing but often suffers from poor zero-shot (ZS) translation qualities. While prior work has explored the causes of overall low ZS performance, our work introduces a fresh perspective: the presence of high variations in ZS performance. This suggests that MNMT does not uniformly exhibit poor ZS capability; instead, certain translation directions yield reasonable results. Through systematic experimentation involving 1,560 language directions spanning 40 languages, we identify three key factors contributing to high variations in ZS NMT performance: 1) target side translation capability 2) vocabulary overlap 3) linguistic properties. Our findings highlight that the target side translation quality is the most influential factor, with vocabulary overlap consistently impacting ZS performance. Additionally, linguistic properties, such as language family and writing system, play a role, particularly with smaller models. Furthermore, we suggest that the off-target issue is a symptom of inadequate ZS performance, emphasizing that zero-shot translation challenges extend beyond addressing the off-target problem. We release the data and models serving as a benchmark to study zero-shot for future research at https://github.com/Smu-Tan/ZS-NMT-Variations
翻译:多语言神经机器翻译(MNMT)虽促进了知识共享,但常因零样本(ZS)翻译质量低下而受限。尽管已有研究探索了ZS整体性能不佳的原因,我们的工作提出了一个全新视角:ZS性能存在高度变化性。这表明MNMT并非普遍表现出较差的零样本能力,而是某些翻译方向可以产生合理的结果。通过对涵盖40种语言、涉及1,560个语言方向的系统性实验,我们识别出导致零样本NMT性能高度变化的三个关键因素:1)目标端翻译能力;2)词汇重叠度;3)语言属性。研究结果表明,目标端翻译质量是最具影响力的因素,而词汇重叠度始终影响零样本性能。此外,语言属性(如语系和书写系统)也发挥作用,尤其在较小模型中更为显著。我们还指出,脱靶问题是零样本性能不足的症状,强调零样本翻译挑战远不止于解决脱靶问题。我们在https://github.com/Smu-Tan/ZS-NMT-Variations 发布了作为研究零样本翻译基准的数据与模型,以供未来研究使用。