Neural Machine translation is a challenging task due to the inherent complex nature and the fluidity that natural languages bring. Nonetheless, in recent years, it has achieved state-of-the-art performance in several language pairs. Although, a lot of traction can be seen in the areas of multilingual neural machine translation (MNMT) in the recent years, there are no comprehensive survey done to identify what approaches work well. The goal of this paper is to investigate the realm of low resource languages and build a Neural Machine Translation model to achieve state-of-the-art results. The paper looks to build upon the mBART language model and explore strategies to augment it with various NLP and Deep Learning techniques like back translation and transfer learning. This implementation tries to unpack the architecture of the NMT application and determine the different components which offers us opportunities to amend the said application within the purview of the low resource languages problem space.
翻译:神经机器翻译因自然语言固有的复杂性和流动性而成为一项具有挑战性的任务。尽管如此,近年来,神经机器翻译在多个语言对中已取得最先进的性能。尽管近年来多语言神经机器翻译(MNMT)领域备受关注,但尚未有全面的综述研究来识别哪些方法效果良好。本文旨在探索低资源语言领域,构建一个能够取得最先进结果的神经机器翻译模型。本文计划基于mBART语言模型,研究通过反向翻译和迁移学习等自然语言处理与深度学习技术增强该模型的策略。该实现尝试剖析神经机器翻译应用的架构,识别出不同组件,从而为我们提供在低资源语言问题空间的范围内改进该应用的机会。