Neural Machine Translation (NMT) has become a significant technology in natural language processing through extensive research and development. However, the deficiency of high-quality bilingual language pair data still poses a major challenge to improving NMT performance. Recent studies are exploring the use of contextual information from pre-trained language model (PLM) to address this problem. Yet, the issue of incompatibility between PLM and NMT model remains unresolved. This study proposes a PLM-integrated NMT (PiNMT) model to overcome the identified problems. The PiNMT model consists of three critical components, PLM Multi Layer Converter, Embedding Fusion, and Cosine Alignment, each playing a vital role in providing effective PLM information to NMT. Furthermore, two training strategies, Separate Learning Rates and Dual Step Training, are also introduced in this paper. By implementing the proposed PiNMT model and training strategy, we achieved state-of-the-art performance on the IWSLT'14 En$\leftrightarrow$De dataset. This study's outcomes are noteworthy as they demonstrate a novel approach for efficiently integrating PLM with NMT to overcome incompatibility and enhance performance.
翻译:神经机器翻译(NMT)经过广泛的研究与发展,已成为自然语言处理中的一项重要技术。然而,高质量双语语言对数据的缺乏仍然是提升NMT性能的主要挑战。近期研究正探索利用预训练语言模型(PLM)的上下文信息来解决这一问题,但PLM与NMT模型之间的不兼容问题仍未得到解决。本研究提出了一种集成PLM的NMT(PiNMT)模型以克服上述问题。PiNMT模型包含三个关键组件:PLM多层转换器、嵌入融合和余弦对齐,每个组件在向NMT提供有效的PLM信息方面发挥着至关重要的作用。此外,本文还引入了两种训练策略:独立学习率和双步训练。通过实施所提出的PiNMT模型和训练策略,我们在IWSLT'14英↔德数据集上取得了最先进的性能。本研究成果意义重大,因为它展示了一种将PLM与NMT高效集成以克服不兼容性并提升性能的新方法。