TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement

Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs still contain multiple errors. Importantly, feeding back such error information into the LLMs can lead to self-refinement and result in improved translation performance. Motivated by these insights, we introduce a systematic LLM-based self-refinement translation framework, named \textbf{TEaR}, which stands for \textbf{T}ranslate, \textbf{E}stimate, \textbf{a}nd \textbf{R}efine, marking a significant step forward in this direction. Our findings demonstrate that 1) our self-refinement framework successfully assists LLMs in improving their translation quality across a wide range of languages, whether it's from high-resource languages to low-resource ones or whether it's English-centric or centered around other languages; 2) TEaR exhibits superior systematicity and interpretability; 3) different estimation strategies yield varied impacts, directly affecting the effectiveness of the final corrections. Additionally, traditional neural translation models and evaluation models operate separately, often focusing on singular tasks due to their limited capabilities, while general-purpose LLMs possess the capability to undertake both tasks simultaneously. We further conduct cross-model correction experiments to investigate the potential relationship between the translation and evaluation capabilities of general-purpose LLMs. Our code and data are available at https://github.com/fzp0424/self_correct_mt

翻译：大语言模型（LLMs）在机器翻译（MT）任务中已取得显著成果。然而，人工细致评估表明，LLMs生成的译文仍存在多种错误。值得注意的是，将这些错误信息反馈给LLMs可促使其进行自我修正，从而提升翻译性能。基于此发现，我们提出一种系统化的基于LLM的自修正翻译框架——\textbf{TEaR}（即\textbf{T}ranslate、\textbf{E}stimate、\textbf{a}nd \textbf{R}efine），标志着该方向的重要进展。我们的研究结果表明：1）该自修正框架能有效帮助LLMs提升多语言翻译质量，无论涉及从高资源语言到低资源语言的翻译，还是以英语为中心或围绕其他语言的翻译；2）TEaR框架展现出卓越的系统性与可解释性；3）不同的评估策略会产生差异化影响，直接影响最终修正效果。此外，传统神经翻译模型与评估模型相互独立，因其能力有限通常仅专注于单一任务，而通用大语言模型具备同时执行这两类任务的能力。我们进一步开展跨模型修正实验，以探究通用大语言模型中翻译能力与评估能力间的潜在关联。相关代码与数据已发布于 https://github.com/fzp0424/self_correct_mt。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日