Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models

The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on pretrained multilingual large language models (LLMs) with high-quality translation pairs. In this paper, we focus on boosting the many-to-many multilingual translation performance of LLMs with an emphasis on zero-shot translation directions. We demonstrate that prompt strategies adopted during instruction finetuning are crucial to zero-shot translation performance and introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages and improve zero-shot translation performance. XConST is not a new method, but a version of CrossConST (Gao et al., 2023a) adapted for multilingual finetuning on LLMs with translation instructions. Experimental results on ALMA (Xu et al., 2023) and LLaMA-2 (Touvron et al., 2023) show that our approach consistently improves translation performance. Our implementations are available at https://github.com/gpengzhi/CrossConST-LLM.

翻译：机器翻译的训练范式已逐渐从使用大规模平行语料库学习神经机器翻译（NMT）模型，转向基于预训练多语言大语言模型（LLM）结合高质量翻译对进行指令微调。本文聚焦于提升LLM在多对多多语言翻译中的性能，尤其关注零样本翻译方向。我们证明指令微调过程中采用的提示策略对零样本翻译性能至关重要，并引入跨语言一致性正则化方法XConST，以弥合不同语言间的表征差异并提升零样本翻译效果。XConST并非全新方法，而是将CrossConST（Gao等，2023a）适配至基于翻译指令的LLM多语言微调版本。在ALMA（Xu等，2023）与LLaMA-2（Touvron等，2023）上的实验结果表明，本方法能持续提升翻译性能。实现代码已开源至https://github.com/gpengzhi/CrossConST-LLM。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日