We present the joint contribution of Unbabel and Instituto Superior T\'ecnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks, reaching state-of-the-art performance for quality estimation at word-, span- and sentence-level granularity. Compared to the previous state-of-the-art COMETKIWI-22, we show large improvements in correlation with human judgements (up to 10 Spearman points). Moreover, we surpass the second-best multilingual submission to the shared-task with up to 3.8 absolute points.
翻译:我们介绍了Unbabel与Instituto Superior Técnico针对WMT 2023质量评估(QE)共享任务的联合贡献。团队参与了所有任务:句子级和词级质量预测(任务1)以及细粒度错误跨度检测(任务2)。对于所有任务,我们基于COMETKIWI-22模型(Rei等人,2022b)进行构建。我们的多语言方法在所有任务中均排名第一,在词级、跨度级和句子级粒度上达到了质量评估的最新性能。与之前的先进模型COMETKIWI-22相比,我们在与人类判断的相关性上取得了显著提升(最多提升10个斯皮尔曼相关系数点)。此外,我们以最高3.8个绝对分数点的优势超越了共享任务中排名第二的多语言提交方案。