While multilingual neural machine translation has achieved great success, it suffers from the off-target issue, where the translation is in the wrong language. This problem is more pronounced on zero-shot translation tasks. In this work, we find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance (i.e., KL-divergence) between two languages' vocabularies is related with a higher off-target rate. We also find that solely isolating the vocab of different languages in the decoder can alleviate the problem. Motivated by the findings, we propose Language Aware Vocabulary Sharing (LAVS), a simple and effective algorithm to construct the multilingual vocabulary, that greatly alleviates the off-target problem of the translation model by increasing the KL-divergence between languages. We conduct experiments on a multilingual machine translation benchmark in 11 languages. Experiments show that the off-target rate for 90 translation tasks is reduced from 29\% to 8\%, while the overall BLEU score is improved by an average of 1.9 points without extra training cost or sacrificing the supervised directions' performance. We release the code at \href{https://github.com/chenllliang/Off-Target-MNMT}{https://github.com/chenllliang/Off-Target-MNMT} for reproduction.
翻译:尽管多语言神经机器翻译取得了巨大成功,但仍存在误目标问题,即翻译结果使用了错误语言。该问题在零样本翻译任务中尤为突出。本研究发现,无法编码具有区分性的目标语言信号会导致误目标,且两种语言词汇表之间的词汇距离(即KL散度)越小,误目标率越高。研究还发现,仅在解码器中隔离不同语言的词汇即可缓解该问题。受此启发,我们提出语言感知词汇共享(LAVS)——一种简单有效的多语言词汇表构建算法,通过增加语言间的KL散度显著缓解翻译模型的误目标问题。我们在包含11种语言的多语言机器翻译基准上进行实验,结果表明90个翻译任务的误目标率从29%降至8%,整体BLEU值平均提升1.9分,且无需额外训练成本或牺牲监督方向性能。我们已在\href{https://github.com/chenllliang/Off-Target-MNMT}{https://github.com/chenllliang/Off-Target-MNMT}公开代码以供复现。