Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation

Automator · Machine Translation · 散度 · Performer · NMT ·

2024 年 4 月 23 日

翻译：自动评估与人工评价的趋同与分歧：从ChatGPT生成翻译与神经机器翻译比较中获得的启示

Zhaokun Jiang,Ziyin Zhang

Large language models have demonstrated parallel and even superior translation performance compared to neural machine translation (NMT) systems. However, existing comparative studies between them mainly rely on automated metrics, raising questions into the feasibility of these metrics and their alignment with human judgment. The present study investigates the convergences and divergences between automated metrics and human evaluation in assessing the quality of machine translation from ChatGPT and three NMT systems. To perform automatic assessment, four automated metrics are employed, while human evaluation incorporates the DQF-MQM error typology and six rubrics. Notably, automatic assessment and human evaluation converge in measuring formal fidelity (e.g., error rates), but diverge when evaluating semantic and pragmatic fidelity, with automated metrics failing to capture the improvement of ChatGPT's translation brought by prompt engineering. These results underscore the indispensable role of human judgment in evaluating the performance of advanced translation tools at the current stage.

翻译：大语言模型在翻译性能上已展现出与神经机器翻译（NMT）系统相当甚至更优的水平。然而，现有关于两者的比较研究主要依赖自动化指标，这引发了对这些指标可行性及其与人工判断一致性的质疑。本研究探讨了自动化指标与人工评价在评估ChatGPT及三个NMT系统机器翻译质量时的趋同与分歧。为实施自动评估，采用了四种自动化指标；而人工评价则纳入了DQF-MQM错误分类体系及六个评分标准。值得注意的是，自动评估与人工评价在测量形式忠实度（如错误率）上趋于一致，但在评估语义与语用忠实度时出现分歧——自动化指标未能捕捉到提示工程对ChatGPT翻译质量的提升效果。这些结果凸显了当前阶段人工判断在评估先进翻译工具性能中不可替代的作用。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日