The overall translation quality reached by current machine translation (MT) systems for high-resourced language pairs is remarkably good. Standard methods of evaluation are not suitable nor intended to uncover the many translation errors and quality deficiencies that still persist. Furthermore, the quality of standard reference translations is commonly questioned and comparable quality levels have been reached by MT alone in several language pairs. Navigating further research in these high-resource settings is thus difficult. In this article, we propose a methodology for creating more reliable document-level human reference translations, called "optimal reference translations," with the simple aim to raise the bar of what should be deemed "human translation quality." We evaluate the obtained document-level optimal reference translations in comparison with "standard" ones, confirming a significant quality increase and also documenting the relationship between evaluation and translation editing.
翻译:当前机器翻译(MT)系统在高资源语言对上的整体翻译质量已相当出色。标准评估方法既不适用于也无意揭示仍普遍存在的诸多翻译错误与质量缺陷。此外,标准参考译文的质量常受质疑,而在多个语言对中,机器翻译单靠自身已能达到与之相当的质量水平。因此,在高资源环境中推进进一步研究十分困难。本文提出一种构建更可靠的文档级人工参考译文的方法,即"最优参考译文",其核心目标在于提升"人工翻译质量"的评判标准。我们将所获得的文档级最优参考译文与"标准"参考译文进行对比评估,证实其质量显著提升,并阐明了评估与翻译编辑之间的关系。