Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

Large Language Models (LLMs) present strong general capabilities, and a current compelling challenge is stimulating their specialized capabilities, such as machine translation, through low-cost instruction tuning. The standard instruction-following data is sequentially organized as the concatenation of an instruction, an input, and a response. As the attention mechanism of LLMs has limitations on local focus, LLMs tend to focus more on the words or sentences nearby at each position. This leads to a high risk of instruction forgetting during decoding. To alleviate the above issues, We propose SWIE (Segment-Weighted Instruction Embedding) and an instruction-following dataset OVERMISS. SWIE improves the model instruction understanding by adding a global instruction representation on the following input and response representations. OVERMISS improves model faithfulness by comparing over-translation and miss-translation results with the correct translation. We apply our methods to two main-stream open-source LLMs, BLOOM and LLaMA. The experimental results demonstrate significant improvements in translation performance with SWIE based on BLOOMZ-3b, particularly in zero-shot and long text translations due to reduced instruction forgetting risk. Additionally, OVERMISS outperforms the baseline in translation performance (e.g. an increase in BLEU scores from 0.69 to 3.12 and an average improvement of 0.48 percentage comet scores for LLaMA-7b) with further enhancements seen in models combining OVERMISS and SWIE (e.g. the BLUE scores increase up to 0.56 from English to German across three different backbones), and both exhibit improvements in the faithfulness metric based on word alignment.

翻译：大语言模型（LLMs）展现出强大的通用能力，当前一项具有挑战性的任务是，通过低成本指令微调激发其机器翻译等专业能力。标准指令遵循数据按序组织为指令、输入和响应的拼接。由于LLMs的注意力机制存在局部聚焦局限性，模型在每个位置更容易关注附近词句，导致解码时存在较高的指令遗忘风险。为缓解上述问题，我们提出SWIE（分段加权指令嵌入）和指令遵循数据集OVERMISS。SWIE通过在后续输入与响应表示上叠加全局指令表示来提升模型指令理解能力，OVERMISS则通过对比过译与漏译结果和正确译文间的差异增强模型忠实度。我们将方法应用于BLOOM和LLaMA两种主流开源LLM。实验结果表明，基于BLOOMZ-3b的SWIE显著提升了翻译性能，尤其在零样本和长文本翻译中因指令遗忘风险降低而表现突出。此外，OVERMISS在翻译性能上超越了基线（例如LLaMA-7b的BLEU值从0.69提升至3.12，平均COMET评分提高0.48个百分点），而结合OVERMISS和SWIE的模型性能进一步提升（例如在三种不同骨干网络中英德翻译任务的BLEU值最高提升0.56）。两种方法均基于词对齐的忠实度指标表现出改进。