The Bahnar people, an ethnic minority in Vietnam with a rich ancestral heritage, possess a language of immense cultural and historical significance. The government places a strong emphasis on preserving and promoting the Bahnaric language by making it accessible online and encouraging communication across generations. Recent advancements in artificial intelligence, such as Neural Machine Translation (NMT), have brought about a transformation in translation by improving accuracy and fluency. This, in turn, contributes to the revival of the language through educational efforts, communication, and documentation. Specifically, NMT is pivotal in enhancing accessibility for Bahnaric speakers, making information and content more readily available. Nevertheless, the translation of Vietnamese into Bahnaric faces practical challenges due to resource constraints, especially given the limited resources available for the Bahnaric language. To address this, we employ state-of-the-art techniques in NMT along with two augmentation strategies for domain-specific Vietnamese-Bahnaric translation task. Importantly, both approaches are flexible and can be used with various neural machine translation models. Additionally, they do not require complex data preprocessing steps, the training of additional systems, or the acquisition of extra data beyond the existing training parallel corpora.
翻译:巴拿族是越南的一个少数民族,拥有丰富的祖先遗产,其语言具有重大的文化和历史意义。政府高度重视巴拿语的保护与推广,致力于使其在线可访问并鼓励跨代际交流。人工智能的最新进展,如神经机器翻译(NMT),通过提高准确性和流畅性带来了翻译领域的变革。这进而通过教育努力、交流和文献记录促进了语言的复兴。具体而言,NMT在提升巴拿语使用者的可及性方面至关重要,使信息和内容更易于获取。然而,由于资源限制,尤其是巴拿语可用资源的匮乏,越南语到巴拿语的翻译面临实际挑战。为解决此问题,我们在特定领域的越南语-巴拿语翻译任务中采用了最先进的NMT技术以及两种增强策略。重要的是,这两种方法均具有灵活性,可与各种神经机器翻译模型结合使用。此外,它们不需要复杂的数据预处理步骤、额外系统的训练,也无需在现有训练平行语料库之外获取额外数据。