Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.
翻译:生成式大语言模型(LLMs)在各种自然语言处理任务中取得了显著进展。然而,这些进展并未充分体现在翻译任务上,尤其是中等规模(即7B或13B参数)的模型,其性能仍落后于传统的监督式编码器-解码器翻译模型。以往研究尝试提升这些中等规模LLMs的翻译能力,但效果有限。在本研究中,我们提出了一种专为翻译任务设计的新型LLM微调方法,无需依赖传统翻译模型通常所需的大量并行数据。该方法包含两个微调阶段:首先在单语数据上进行初始微调,随后在一小部分高质量并行数据上进行进一步微调。我们将通过此策略开发的LLM称为先进语言模型翻译器(ALMA)。以LLaMA-2为基础模型,我们的结果表明,在WMT'21(2个方向)和WMT'22(8个方向)测试数据集的10个翻译方向上,该模型相比其零样本性能平均提升了超过12个BLEU点和12个COMET点。这一性能显著优于所有先前工作,甚至超越了NLLB-54B模型和GPT-3.5-text-davinci-003,且仅使用7B或13B参数。该方法为机器翻译建立了一种新的训练范式基础。