Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data, prompting research into unsupervised methods. Unsupervised neural machine translation (UNMT) methods, including back-translation, transfer learning, and pivot-based translation, offer practical solutions for LRL translation, but they are hindered by issues like synthetic data noise, language bias, and error propagation, which can potentially be mitigated by Large Language Models (LLMs). LLMs have advanced NMT with in-context learning (ICL) and supervised fine-tuning methods, but insufficient training data results in poor performance in LRLs. We argue that LLMs can mitigate the linguistic noise with auxiliary languages to improve translations in LRLs. In this paper, we propose Probability-driven Meta-graph Prompter (POMP), a novel approach employing a dynamic, sampling-based graph of multiple auxiliary languages to enhance LLMs' translation capabilities for LRLs. POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training. We use the BLEURT metric to evaluate the translations and back-propagate rewards, estimated by scores, to update the probabilities of auxiliary languages in the paths. Our experiments show significant improvements in the translation quality of three LRLs, demonstrating the effectiveness of our approach.
翻译:低资源语言(LRLs)在监督神经机器翻译中因平行数据有限而面临挑战,这促使研究者探索无监督方法。无监督神经机器翻译(UNMT)方法,包括反向翻译、迁移学习和基于枢轴翻译,为LRL翻译提供了实用方案,但受限于合成数据噪声、语言偏差和错误传播等问题,而大型语言模型(LLMs)有望缓解这些困难。LLMs通过上下文学习(ICL)和监督微调方法推动了NMT的发展,但训练数据不足导致其在LRLs中表现欠佳。我们论证认为,LLMs可利用辅助语言缓解语言噪声以提升LRLs的翻译质量。本文提出概率驱动元图提示器(POMP),这是一种新颖方法,通过采用基于动态采样的多辅助语言图来增强LLMs对LRLs的翻译能力。POMP为每种源语言构建有向无环元图,从中动态采样多条路径以提示LLMs缓解语言噪声,并在训练过程中改善翻译。我们使用BLEURT指标评估翻译质量,并通过分数估计的奖励进行反向传播,以更新路径中辅助语言的概率。实验结果表明,该方法显著提升了三种低资源语言的翻译质量,验证了其有效性。