Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT quality assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) in system-level MT quality estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) prompting outperforms one-step prompting by offering improved reasoning and explainability. In this paper, we introduce Knowledge-Prompted Estimator (KPE), a CoT prompting method that combines three one-step prompting techniques, including perplexity, token-level similarity, and sentence-level similarity. This method attains enhanced performance for segment-level estimation compared with previous deep learning models and one-step prompting approaches. Furthermore, supplementary experiments on word-level visualized alignment demonstrate that our KPE method significantly improves token alignment compared with earlier models and provides better interpretability for MT quality estimation. Code will be released upon publication.
翻译:跨语言机器翻译质量估计在评估翻译性能中扮演着关键角色。GEMBA作为首个基于大型语言模型的机器翻译质量评估指标,采用单步提示在系统级机器翻译质量评估中达到了最先进水平;然而,它缺乏句子级分析。相比之下,思维链提示通过提供改进的推理能力和可解释性,优于单步提示。本文介绍了知识提示评估器,这是一种思维链提示方法,结合了包括困惑度、词级相似度和句子级相似度在内的三种单步提示技术。与以往的深度学习模型和单步提示方法相比,该方法在句子级评估中取得了更优性能。此外,关于词级可视化对齐的补充实验表明,我们的KPE方法相比早期模型显著改善了词对齐效果,并为机器翻译质量评估提供了更好的可解释性。代码将在发表时公开。