Large language models (LLMs) have demonstrated remarkable proficiency in machine translation (MT), even without specific training on the languages in question. However, translating rare words in low-resource or domain-specific contexts remains challenging for LLMs. To address this issue, we propose a multi-step prompt chain that enhances translation faithfulness by prioritizing key terms crucial for semantic accuracy. Our method first identifies these keywords and retrieves their translations from a bilingual dictionary, integrating them into the LLM's context using Retrieval-Augmented Generation (RAG). We further mitigate potential output hallucinations caused by long prompts through an iterative self-checking mechanism, where the LLM refines its translations based on lexical and semantic constraints. Experiments using Llama and Qwen as base models on the FLORES-200 and WMT datasets demonstrate significant improvements over baselines, highlighting the effectiveness of our approach in enhancing translation faithfulness and robustness, particularly in low-resource scenarios.
翻译:大型语言模型(LLM)在机器翻译(MT)任务中展现出卓越的能力,即使未针对特定语言进行专门训练。然而,在低资源或特定领域语境中翻译罕见词汇对LLM而言仍具挑战性。为解决这一问题,我们提出一种多步骤提示链方法,通过优先处理对语义准确性至关重要的关键术语来提升翻译忠实度。该方法首先识别这些关键词,并从双语词典中检索其对应翻译,借助检索增强生成(RAG)技术将其整合至LLM的上下文环境中。我们进一步通过迭代自检机制缓解长提示可能引发的输出幻觉问题:LLM基于词汇与语义约束对翻译结果进行迭代优化。以Llama和Qwen为基础模型,在FLORES-200与WMT数据集上的实验表明,相较于基线方法,本方法取得显著提升,突显了其在增强翻译忠实度与鲁棒性方面的有效性,尤其在低资源场景中表现突出。