Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Large Language Models (LLMs) have achieved remarkable results in the machine translation evaluation task, yet there remains a gap in knowledge regarding how they utilize the provided data to conduct evaluations. This study aims to explore how LLMs leverage source and reference information in evaluating translations, with the ultimate goal of better understanding the working mechanism of LLMs. To this end, we design the controlled experiments across various input modes and model types, and employ both coarse-grained and fine-grained prompts to discern the utility of source versus reference information. Surprisingly, we find that reference information significantly enhances the evaluation accuracy, while source information sometimes is counterproductive, indicating a lack of cross-lingual capability when using LLMs to evaluate translations. We further conduct a meta-evaluation for translation error detection of LLMs, observing a similar phenomenon. These findings also suggest a potential research direction for LLMs that fully exploits the cross-lingual capability of LLMs to achieve better performance in machine translation evaluation tasks.

翻译：大型语言模型（LLMs）在机器翻译评估任务中取得了显著成果，但关于它们如何利用所提供数据进行评估的知识仍存在空白。本研究旨在探索LLMs在评估翻译时如何利用源语言和参考信息，最终目标是更好地理解LLMs的工作机制。为此，我们针对不同输入模式和模型类型设计了受控实验，并采用粗粒度和细粒度提示来区分源语言信息与参考信息的效用。令人惊讶的是，我们发现参考信息能显著提升评估准确性，而源语言信息有时反而产生反效果，这表明使用LLMs评估翻译时存在跨语言能力的不足。我们进一步对LLMs的翻译错误检测进行了元评估，观察到类似现象。这些发现也指出了LLMs的一个潜在研究方向，即充分挖掘其跨语言能力以在机器翻译评估任务中实现更优性能。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日