Pitfalls and Outlooks in Using COMET

Since its introduction, the COMET metric has blazed a trail in the machine translation community, given its strong correlation with human judgements of translation quality. Its success stems from being a modified pre-trained multilingual model finetuned for quality assessment. However, it being a machine learning model also gives rise to a new set of pitfalls that may not be widely known. We investigate these unexpected behaviours from three aspects: 1) technical: obsolete software versions and compute precision; 2) data: empty content, language mismatch, and translationese at test time as well as distribution and domain biases in training; 3) usage and reporting: multi-reference support and model referencing in the literature. All of these problems imply that COMET scores is not comparable between papers or even technical setups and we put forward our perspective on fixing each issue. Furthermore, we release the SacreCOMET package that can generate a signature for the software and model configuration as well as an appropriate citation. The goal of this work is to help the community make more sound use of the COMET metric.

翻译：自问世以来，COMET指标凭借其与人工翻译质量评估结果的高度相关性，在机器翻译领域开辟了新道路。其成功源于它是在经过修改的预训练多语言模型基础上微调而成的质量评估工具。然而，作为机器学习模型，它也存在一系列可能尚未广为人知的潜在陷阱。我们从三个维度探究这些意外行为：1）技术层面：过时的软件版本与计算精度问题；2）数据层面：测试时的空内容、语言不匹配及翻译腔现象，以及训练数据中的分布偏差与领域偏差；3）使用与报告层面：多参考译文支持机制及学术文献中的模型引用规范。所有这些问题都意味着不同论文甚至不同技术配置下的COMET分数不具备可比性，我们针对每个问题提出了改进视角。此外，我们发布了SacreCOMET工具包，可为软件和模型配置生成特征标识符并提供规范引用格式。本研究旨在帮助学界更规范地使用COMET评估指标。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日