State-of-the-art trainable machine translation evaluation metrics like xCOMET achieve high correlation with human judgment but rely on large encoders (up to 10.7B parameters), making them computationally expensive and inaccessible to researchers with limited resources. To address this issue, we investigate whether the knowledge stored in these large encoders can be compressed while maintaining quality. We employ distillation, quantization, and pruning techniques to create efficient xCOMET alternatives and introduce a novel data collection pipeline for efficient black-box distillation. Our experiments show that, using quantization, xCOMET can be compressed up to three times with no quality degradation. Additionally, through distillation, we create an 278M-sized xCOMET-lite metric, which has only 2.6% of xCOMET-XXL parameters, but retains 92.1% of its quality. Besides, it surpasses strong small-scale metrics like COMET-22 and BLEURT-20 on the WMT22 metrics challenge dataset by 6.4%, despite using 50% fewer parameters. All code, dataset, and models are available online at https://github.com/NL2G/xCOMET-lite.
翻译:当前最先进的、可训练的机器翻译评估指标(如 xCOMET)虽能实现与人类判断的高度相关性,但其依赖大型编码器(参数高达 107 亿),导致计算成本高昂,资源有限的研究者难以使用。为解决此问题,我们探究了能否在保持质量的同时,压缩这些大型编码器中存储的知识。我们采用知识蒸馏、量化和剪枝技术来创建高效的 xCOMET 替代方案,并引入一种新颖的数据收集流程,以实现高效的黑盒蒸馏。实验表明,通过量化技术,xCOMET 可被压缩至原规模的三分之一且无质量损失。此外,通过蒸馏,我们创建了一个参数量为 2.78 亿的 xCOMET-lite 指标,其参数量仅为 xCOMET-XXL 的 2.6%,但保留了其 92.1% 的质量。同时,在 WMT22 指标挑战数据集上,尽管参数量减少了 50%,其性能仍超越 COMET-22 和 BLEURT-20 等强劲的小规模指标 6.4%。所有代码、数据集和模型均已在线发布:https://github.com/NL2G/xCOMET-lite。