What do Large Language Models Need for Machine Translation Evaluation?

Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium- and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT prompting, than smaller models. We also observe that LLMs do not always provide a numerical score when generating evaluations, which poses a question on their reliability for the task. Our work presents a comprehensive analysis for resource-constrained and training-less LLM-based evaluation of machine translation. We release the accrued prompt templates, code and data publicly for reproducibility.

翻译：利用大型语言模型（LLMs）处理各种自然语言处理任务，引发了关于其性能的卓越论断。在机器翻译（MT）评估方面，现有研究表明，LLMs能够取得与经过微调的多语言预训练语言模型相当的结果。本文探讨了LLMs评估机器翻译质量需要哪些翻译信息，例如源文本、参考译文、翻译错误和标注指南。此外，我们研究了针对涵盖高资源、中资源和低资源语言的八个语言对，利用不同LLM变体时的提示技术，包括零样本提示、思维链（CoT）提示和少样本提示。我们的研究结果表明，参考译文对于基于LLM的评估至关重要。虽然更大的模型不一定表现更好，但它们往往比更小的模型更能从CoT提示中受益。我们还观察到，LLMs在生成评估时并不总是提供数值分数，这对它们在该任务中的可靠性提出了疑问。我们的工作为资源受限且无需训练的基于LLM的机器翻译评估提供了全面分析。我们公开发布了积累的提示模板、代码和数据，以确保可复现性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日