Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs still contain multiple errors. Importantly, feeding back such error information into the LLMs can lead to self-correction and result in improved translation performance. Motivated by these insights, we introduce a systematic LLM-based self-correcting translation framework, named TER, which stands for Translate, Estimate, and Refine, marking a significant step forward in this direction. Our findings demonstrate that 1) our self-correction framework successfully assists LLMs in improving their translation quality across a wide range of languages, whether it's from high-resource languages to low-resource ones or whether it's English-centric or centered around other languages; 2) TER exhibits superior systematicity and interpretability compared to previous methods; 3) different estimation strategies yield varied impacts on AI feedback, directly affecting the effectiveness of the final corrections. We further compare different LLMs and conduct various experiments involving self-correction and cross-model correction to investigate the potential relationship between the translation and evaluation capabilities of LLMs.
翻译:大语言模型(LLMs)在机器翻译(MT)领域取得了令人瞩目的成果。然而,人工精细评估显示,LLMs生成的翻译仍包含多种错误。重要的是,将这些错误信息反馈给LLMs可触发自我修正,从而提升翻译性能。受此启发,我们提出了一种基于LLM的系统性自我修正翻译框架TER(即翻译、评估与精炼的英文缩写),标志着该方向的重要进展。研究发现:1)我们的自我修正框架成功帮助LLMs提升各类语言对的翻译质量,无论是高资源语言到低资源语言,还是以英语为中心或其他语言为中心的场景;2)与先前方法相比,TER展现出更强的系统性和可解释性;3)不同评估策略对AI反馈产生差异化影响,直接决定最终修正效果。我们进一步比较了不同LLMs,并通过自我修正与跨模型修正的系列实验,探究LLMs翻译能力与评估能力之间的潜在关联。