Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions. Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. Previous work has used a simple averaging of probability distributions from multiple paths, but we find that this performs worse than using a single pivot, and exacerbates the hallucination problem because the same hallucinations can be probable across different paths. We also propose MaxEns, a novel combination strategy that makes the output biased towards the most confident predictions, hypothesising that confident predictions are less prone to be hallucinations. We evaluate different strategies on the FLORES benchmark for 20 low-resource language directions, demonstrating that MaxEns improves translation quality for low-resource languages while reducing hallucination in translations, compared to both direct translation and an averaging approach. On average, multi-pivot strategies still lag behind using English as a single pivot language, raising the question of how to identify the best pivoting strategy for a given translation direction.
翻译:摘要:大规模多语言机器翻译模型能够使用单一模型实现大量语言间的翻译,但在面向低资源及极低资源的翻译方向上性能有限。通过高资源语言进行枢轴翻译仍是处理低资源方向的有效策略,本文重新审视了通过多种语言进行枢轴翻译的方式。此前的研究采用多路径概率分布的简单平均方法,但我们发现该策略表现不及单一枢轴,且会加剧"幻觉"问题——因为相同幻觉内容在不同路径中可能具有相似概率。我们提出MaxEns这一新型组合策略,通过使输出偏向于最可信的预测来假设可信预测更不易出现幻觉。我们在FLORES基准上针对20个低资源语言方向评估了不同策略,结果表明:与直接翻译及平均方法相比,MaxEns在提升低资源语言翻译质量的同时减少了翻译中的幻觉现象。平均而言,多枢轴策略仍不及将英语作为单一枢轴语言的效果,这引发了如何为特定翻译方向确定最优枢轴策略的问题。