Large-scale Pretrained Language Models~(LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translations, without being explicitly trained on parallel corpora. It is interesting how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. Firstly, we show that the multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language pair, the performance depends on both the language families and the amount of data used in the pretraining phase. Secondly, we find that LLMs' ability to carry out translation instructions relies on the understanding of translation instruction and the alignment among different languages. With proper enhancement, LLMs could perform the translation task well even for those language pairs unseen during the instruction tuning phase.
翻译:大规模预训练语言模型(如ChatGPT和GPT4)表现出强大的多语言翻译能力,尽管未在平行语料上进行显式训练。令人感兴趣的是,这些大语言模型是如何获得执行不同语言翻译指令的能力的。在本文中,我们通过对多语言预训练模型XGLM-7B进行微调,使其能够按照给定指令执行多语言翻译,从而进行详细分析。首先,我们展示多语言大语言模型具有比先前研究更强的翻译能力。对于特定语言对,其性能取决于语言家族及预训练阶段使用的数据量。其次,我们发现大语言模型执行翻译指令的能力取决于对翻译指令的理解以及不同语言间的对齐。通过适当的增强,大语言模型甚至能够为指令微调阶段未见过的语言对良好地执行翻译任务。