Direct neural machine translation (direct NMT) is a type of NMT system that translates text between two non-English languages. Direct NMT systems often face limitations due to the scarcity of parallel data between non-English language pairs. Several approaches have been proposed to address this limitation, such as multilingual NMT and pivot NMT (translation between two languages via English). Task-level Mixture of expert models (Task-level MoE), an inference-efficient variation of Transformer-based models, has shown promising NMT performance for a large number of language pairs. In Task-level MoE, different language groups can use different routing strategies to optimize cross-lingual learning and inference speed. In this work, we examine Task-level MoE's applicability in direct NMT and propose a series of high-performing training and evaluation configurations, through which Task-level MoE-based direct NMT systems outperform bilingual and pivot-based models for a large number of low and high-resource direct pairs, and translation directions. Our Task-level MoE with 16 experts outperforms bilingual NMT, Pivot NMT models for 7 language pairs, while pivot-based models still performed better in 9 pairs and directions.
翻译:直接神经机器翻译(direct NMT)是一种在两个非英语语言之间进行文本翻译的神经机器翻译系统。由于非英语语言对之间的平行数据稀缺,直接NMT系统常面临诸多局限。为应对这一挑战,研究者提出了多语言NMT和枢轴NMT(通过英语实现两种语言间的翻译)等多种方法。任务级专家混合模型(Task-level MoE)作为基于Transformer模型的一种高效推理变体,在大量语言对的NMT任务中展现出优异性能。在Task-level MoE中,不同语言组可采用不同路由策略以优化跨语言学习效果与推理速度。本研究探讨了Task-level MoE在直接NMT中的适用性,并提出了一系列高性能训练与评估配置方案。基于Task-level MoE的直接NMT系统在大量低资源和高资源直接语言对及翻译方向上均优于双语NMT和枢轴模型。采用16个专家的Task-level MoE在7个语言对上超越了双语NMT与枢轴NMT模型,但枢轴模型仍在9个语言对及方向上表现更优。