Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from developing accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the performance of LLMs on translation, recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs by feeding few-shot demonstrations. However, these methods essentially do not improve LLM's ability to follow translation instructions, especially the language direction information. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs. Specifically, we first tune LLMs with the maximum likelihood estimation loss on the translation dataset to elicit the basic translation capabilities. In the second stage, we construct instruction-conflicting samples by randomly replacing the translation directions with a wrong one within the instruction, and then introduce an extra unlikelihood loss to learn those samples. Experiments on IWSLT and WMT benchmarks upon the LLaMA model spanning 16 zero-shot directions show that, compared to the competitive baseline -- translation-finetuned LLama, our method could effectively reduce the off-target translation ratio (averagely -53.3\%), thus improving translation quality with average +5.7 SacreBLEU and +16.4 BLEURT. Analysis shows that our method could preserve the model's general task performance on AlpacaEval. Code and models will be released at \url{https://github.com/alphadl/LanguageAware_Tuning}.
翻译:翻译导向的大型语言模型展现出卓越的翻译能力,甚至可与监督训练的商业翻译系统相媲美。然而,脱靶翻译问题仍未解决,尤其在低资源语言中严重阻碍了基于LLMs的高精度翻译模型开发。为缓解脱靶翻译问题并提升LLMs的翻译性能,近期研究要么设计高级提示策略以突出翻译指令的功能性,要么通过提供少量样本示例利用LLMs的上下文学习能力。但这些方法本质上并未增强LLMs遵循翻译指令(尤其是语言方向信息)的能力。本文提出一种两阶段微调算法,旨在提升LLMs的指令遵循能力(特别是翻译方向)。具体而言:第一阶段在翻译数据集上使用最大似然估计损失微调LLMs,激发其基础翻译能力;第二阶段构建指令冲突样本(将指令中的翻译方向随机替换为错误方向),并引入额外的非似然损失学习这些样本。在涵盖16个零样本方向的IWSLT和WMT基准数据集上基于LLaMA模型的实验表明,相比竞争基线(翻译微调LLaMA),本方法能有效降低脱靶翻译比例(平均降低53.3%),从而提升翻译质量(平均SacreBLEU提升5.7,BLEURT提升16.4)。分析表明,本方法可保持模型在AlpacaEval上的通用任务性能。代码与模型将于\url{https://github.com/alphadl/LanguageAware_Tuning}开源。