Automated metrics for Machine Translation have made significant progress, with the goal of replacing expensive and time-consuming human evaluations. These metrics are typically assessed by their correlation with human judgments, which captures the monotonic relationship between human and metric scores. However, we argue that it is equally important to ensure that metrics treat all systems fairly and consistently. In this paper, we introduce a method to evaluate this aspect.
翻译:机器翻译的自动化指标已取得显著进展,其目标在于替代昂贵且耗时的人工评估。这些指标通常通过其与人工判断的相关性进行评估,这反映了人工评分与指标得分之间的单调关系。然而,我们认为确保指标对所有系统公平且一致地处理同样至关重要。本文提出一种评估该方面的方法。