POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data, prompting research into unsupervised methods. Unsupervised neural machine translation (UNMT) methods, including back-translation, transfer learning, and pivot-based translation, offer practical solutions for LRL translation, but they are hindered by issues like synthetic data noise, language bias, and error propagation, which can potentially be mitigated by Large Language Models (LLMs). LLMs have advanced NMT with in-context learning (ICL) and supervised fine-tuning methods, but insufficient training data results in poor performance in LRLs. We argue that LLMs can mitigate the linguistic noise with auxiliary languages to improve translations in LRLs. In this paper, we propose Probability-driven Meta-graph Prompter (POMP), a novel approach employing a dynamic, sampling-based graph of multiple auxiliary languages to enhance LLMs' translation capabilities for LRLs. POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training. We use the BLEURT metric to evaluate the translations and back-propagate rewards, estimated by scores, to update the probabilities of auxiliary languages in the paths. Our experiments show significant improvements in the translation quality of three LRLs, demonstrating the effectiveness of our approach.

翻译：低资源语言（LRLs）在监督神经机器翻译中因平行数据有限而面临挑战，这促使研究者探索无监督方法。无监督神经机器翻译（UNMT）方法，包括反向翻译、迁移学习和基于枢轴翻译，为LRL翻译提供了实用方案，但受限于合成数据噪声、语言偏差和错误传播等问题，而大型语言模型（LLMs）有望缓解这些困难。LLMs通过上下文学习（ICL）和监督微调方法推动了NMT的发展，但训练数据不足导致其在LRLs中表现欠佳。我们论证认为，LLMs可利用辅助语言缓解语言噪声以提升LRLs的翻译质量。本文提出概率驱动元图提示器（POMP），这是一种新颖方法，通过采用基于动态采样的多辅助语言图来增强LLMs对LRLs的翻译能力。POMP为每种源语言构建有向无环元图，从中动态采样多条路径以提示LLMs缓解语言噪声，并在训练过程中改善翻译。我们使用BLEURT指标评估翻译质量，并通过分数估计的奖励进行反向传播，以更新路径中辅助语言的概率。实验结果表明，该方法显著提升了三种低资源语言的翻译质量，验证了其有效性。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日