Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and OPUS-100 datasets, considering both training from scratch and fine-tuning scenarios. Experimental results show that, compared to the encoder-decoder architecture, our methods not only perform competitively in supervised translations but also achieve improvements of up to 3.39 BLEU, 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET in zero-shot translations.
翻译:现有的多语言神经机器翻译(MNMT)方法主要致力于改进基于编码器-解码器架构的模型以实现多语言翻译。然而,仅解码器架构在MNMT中探索较少,因其仅基于平行数据训练时表现欠佳。本工作中,我们将仅解码器架构的问题归因于其缺乏语言迁移能力。具体而言,仅解码器架构在利用目标语言特征编码源语言词元方面存在不足。我们提出将解码过程分为两个阶段,使得第一阶段显式排除目标词元,从而隐式增强跨语言迁移能力。此外,我们在翻译指令上施加对比学习,从而提升零样本翻译性能。我们在TED-19和OPUS-100数据集上进行实验,涵盖从头训练和微调两种场景。实验结果表明,与编码器-解码器架构相比,我们的方法不仅在监督翻译中表现相当,还在零样本翻译中实现了最高3.39 BLEU、6.99 chrF++、3.22 BERTScore和4.81 COMET的性能提升。