The year 2024 marks the 10th anniversary of the Multidimensional Quality Metrics (MQM) framework for analytic translation quality evaluation. The MQM error typology has been widely used by practitioners in the translation and localization industry and has served as the basis for many derivative projects. The annual Conference on Machine Translation (WMT) shared tasks on both human and automatic translation quality evaluations used the MQM error typology. The metric stands on two pillars: error typology and the scoring model. The scoring model calculates the quality score from annotation data, detailing how to convert error type and severity counts into numeric scores to determine if the content meets specifications. Previously, only the raw scoring model had been published. This April, the MQM Council published the Linear Calibrated Scoring Model, officially presented herein, along with the Non-Linear Scoring Model, which had not been published before. This paper details the latest MQM developments and presents a universal approach to translation quality measurement across three sample size ranges. It also explains why Statistical Quality Control should be used for very small sample sizes, starting from a single sentence.
翻译:2024年是用于分析性翻译质量评估的多维质量指标(MQM)框架诞生十周年。MQM错误分类法已被翻译与本地化行业的从业者广泛使用,并成为众多衍生项目的基础。年度机器翻译会议(WMT)在人工与自动翻译质量评估共享任务中均采用了MQM错误分类法。该指标基于两大支柱:错误分类法与评分模型。评分模型从标注数据计算质量得分,详细说明如何将错误类型与严重程度计数转换为数值分数,以判断内容是否符合规范。此前仅原始评分模型被公开。今年四月,MQM委员会发布了线性校准评分模型(本文正式提出),以及此前未公开的非线性评分模型。本文详述了MQM的最新进展,并提出了覆盖三个样本量范围的翻译质量测量通用方法,同时解释了为何应从单个句子开始的极小样本量采用统计质量控制。