Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions. Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. Previous work has used a simple averaging of probability distributions from multiple paths, but we find that this performs worse than using a single pivot, and exacerbates the hallucination problem because the same hallucinations can be probable across different paths. As an alternative, we propose MaxEns, a combination strategy that is biased towards the most confident predictions, hypothesising that confident predictions are less prone to be hallucinations. We evaluate different strategies on the FLORES benchmark for 20 low-resource language directions, demonstrating that MaxEns improves translation quality for low-resource languages while reducing hallucination in translations, compared to both direct translation and an averaging approach. On average, multi-pivot strategies still lag behind using English as a single pivot language, raising the question of how to identify the best pivoting strategy for a given translation direction.
翻译:大规模多语言机器翻译模型支持用单一模型翻译大量语言,但在低资源和极低资源翻译方向上表现有限。通过高资源语言进行枢纽翻译仍是低资源方向的有效策略,本文重新探讨了通过多种语言进行枢纽翻译的方法。先前工作采用多路径概率分布的简单平均,但发现其效果不如单一枢纽翻译,且会加剧幻觉问题——因为相同幻觉可能在不同路径中均出现概率较高。作为替代方案,我们提出MaxEns,一种偏向于最置信预测的组合策略,假设置信预测更不易产生幻觉。我们在FLORES基准上针对20个低资源语言方向评估不同策略,结果表明:与直接翻译和平均方法相比,MaxEns能提升低资源语言的翻译质量并减少翻译中的幻觉。平均而言,多枢纽策略仍落后于使用英语作为单一枢纽语言的方法,这引发了如何针对特定翻译方向确定最佳枢纽策略的问题。