Large language model (LLM) has achieved promising performance in multilingual machine translation tasks through zero/few-shot prompts or prompt-tuning. However, due to the mixture of multilingual data during the pre-training of LLM, the LLM-based translation models face the off-target issue in both prompt-based methods, including a series of phenomena, namely instruction misunderstanding, translation with wrong language and over-generation. For this issue, this paper introduces an \textbf{\underline{A}}uto-\textbf{\underline{C}}onstriction \textbf{\underline{T}}urning mechanism for \textbf{\underline{M}}ultilingual \textbf{\underline{N}}eural \textbf{\underline{M}}achine \textbf{\underline{T}}ranslation (\model), which is a novel supervised fine-tuning mechanism and orthogonal to the traditional prompt-based methods. In this method, \model automatically constructs a constrained template in the target side by adding trigger tokens ahead of the ground truth. Furthermore, trigger tokens can be arranged and combined freely to represent different task semantics, and they can be iteratively updated to maximize the label likelihood. Experiments are performed on WMT test sets with multiple metrics, and the experimental results demonstrate that \model achieves substantially improved performance across multiple translation directions and reduce the off-target phenomena in the translation.
翻译:大语言模型(LLM)通过零/少样本提示或提示调优在多语言机器翻译任务中取得了显著性能。然而,由于LLM预训练过程中多语言数据的混合,基于LLM的翻译模型在基于提示的方法中面临脱靶问题,包括一系列现象,即指令误解、错误语言翻译和过度生成。针对这一问题,本文引入了一种用于多语言神经机器翻译的自动收缩调优机制(ACT-MNMT),这是一种新颖的有监督微调机制,且与传统的基于提示的方法正交。在该方法中,ACT-MNMT通过在目标侧的真实标签前添加触发标记,自动构建受约束模板。此外,触发标记可以自由排列组合以表示不同的任务语义,并可通过迭代更新以最大化标签似然。我们在WMT测试集上使用多个指标进行了实验,结果表明ACT-MNMT在多个翻译方向上取得了显著提升的性能,并减少了翻译中的脱靶现象。