Efficiency is a key property to foster inclusiveness and reduce environmental costs, especially in an era of LLMs. In this work, we provide a comprehensive evaluation of efficiency for MT evaluation metrics. Our approach involves replacing computation-intensive transformers with lighter alternatives and employing linear and quadratic approximations for alignment algorithms on top of LLM representations. We evaluate six (reference-free and reference-based) metrics across three MT datasets and examine 16 lightweight transformers. In addition, we look into the training efficiency of metrics like COMET by utilizing adapters. Our results indicate that (a) TinyBERT provides the optimal balance between quality and efficiency, (b) CPU speed-ups are more substantial than those on GPU; (c) WMD approximations yield no efficiency gains while reducing quality and (d) adapters enhance training efficiency (regarding backward pass speed and memory requirements) as well as, in some cases, metric quality. These findings can help to strike a balance between evaluation speed and quality, which is essential for effective NLG systems. Furthermore, our research contributes to the ongoing efforts to optimize NLG evaluation metrics with minimal impact on performance. To our knowledge, ours is the most comprehensive analysis of different aspects of efficiency for MT metrics conducted so far.
翻译:效率是促进包容性并降低环境成本的关键特性,尤其是在大语言模型时代。本研究对机器翻译评估指标的效率进行了全面评估。我们的方法包括用轻量级替代方案替换计算密集型Transformer,并在大语言模型表征基础上采用线性与二次近似方法处理对齐算法。我们基于三个机器翻译数据集评估了六种(无参考与基于参考)指标,并检验了16种轻量级Transformer。此外,我们通过适配器探究了COMET等指标的训练效率。研究结果表明:(a)TinyBERT在质量与效率之间实现了最优平衡;(b)CPU加速效果优于GPU;(c)词移距离近似在降低质量的同时未能带来效率提升;(d)适配器可提升训练效率(体现在反向传播速度与内存需求方面),且在部分场景下还能改善指标质量。这些发现有助于在评估速度与质量之间建立平衡——这对高效自然语言生成系统至关重要。此外,我们的研究为在最小化性能影响下优化自然语言生成评估指标的持续努力提供了贡献。据我们所知,这是迄今对机器翻译指标效率不同维度最全面的分析。