Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and OPUS-100 datasets, considering both training from scratch and fine-tuning scenarios. Experimental results show that, compared to the encoder-decoder architecture, our methods not only perform competitively in supervised translations but also achieve improvements of up to 3.39 BLEU, 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET in zero-shot translations.

翻译：现有的多语言神经机器翻译（MNMT）方法主要致力于改进基于编码器-解码器架构的模型以实现多语言翻译。然而，仅解码器架构在MNMT中探索较少，因其仅基于平行数据训练时表现欠佳。本工作中，我们将仅解码器架构的问题归因于其缺乏语言迁移能力。具体而言，仅解码器架构在利用目标语言特征编码源语言词元方面存在不足。我们提出将解码过程分为两个阶段，使得第一阶段显式排除目标词元，从而隐式增强跨语言迁移能力。此外，我们在翻译指令上施加对比学习，从而提升零样本翻译性能。我们在TED-19和OPUS-100数据集上进行实验，涵盖从头训练和微调两种场景。实验结果表明，与编码器-解码器架构相比，我们的方法不仅在监督翻译中表现相当，还在零样本翻译中实现了最高3.39 BLEU、6.99 chrF++、3.22 BERTScore和4.81 COMET的性能提升。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日