IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.

翻译：机器翻译系统的快速发展促使人们开展全面研究，对所使用的评估指标进行元评估，从而更好地选择最能反映机器翻译质量的指标。然而，多数研究聚焦于高资源语言（主要是英语），其观察结果未必适用于其他语言。拥有超过十亿使用人口的印度语言在语言学上与英语存在差异，且迄今为止，尚未有系统研究对从英语到印度语言的机器翻译系统进行评估。本文通过构建一个包含7000条细粒度标注的MQM数据集（覆盖5种印度语言和7个机器翻译系统）填补了这一空白，并利用该数据集建立标注者得分与现有自动评估指标得分之间的相关性。结果表明，预训练指标（如COMET）与标注者得分的相关性最高。此外，我们发现这些指标未能充分捕捉印度语言中基于流畅性的错误，因此需要开发专注于印度语言的评估指标。我们期望本数据集及分析能推动该领域的进一步研究。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日