Current Machine Translation systems achieve very good results on a growing variety of language pairs and data sets. However, it is now well known that they produce fluent translation outputs that often can contain important meaning errors. Quality Estimation task deals with the estimation of quality of translations produced by a Machine Translation system without depending on Reference Translations. A number of approaches have been suggested over the years. In this paper we show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system. Our experiments show that this simple and direct method holds promise for quality estimation of translations produced by any purely data driven machine translation system.
翻译:当前的机器翻译系统在越来越多的语言对和数据集上取得了非常优异的成果。然而,众所周知,它们会产生流畅的翻译输出,但其中常常包含重要的意义错误。质量估计任务旨在不依赖参考译文的情况下,评估机器翻译系统所产生翻译的质量。多年来,人们提出了多种方法。本文表明,用于训练机器翻译系统的平行语料库,直接蕴含了评估该系统所产生翻译质量的线索。实验表明,这种简单直接的方法对于评估任何纯数据驱动机器翻译系统所产生翻译的质量,具有应用前景。