Reranking a list of candidates from a machine translation system with an external scoring model and returning the highest-scoring candidate remains a simple and effective method for improving the overall output quality. Translation scoring models continue to grow in size, with the best models being comparable to generation models. Thus, reranking can add substantial computational cost to the translation pipeline. In this work, we pose reranking as a Bayesian optimization (BayesOpt) problem. By strategically selecting candidates to score based on a balance of exploration and exploitation, we show that it is possible to find top-scoring candidates when scoring only a fraction of the candidate list. For instance, our method achieves the same CometKiwi score using only 70 scoring evaluations compared a baseline system using 180. We present a multi-fidelity setting for BayesOpt, where the candidates are first scored with a cheaper but noisier proxy scoring model, which further improves the cost-performance tradeoff when using smaller but well-trained distilled proxy scorers.
翻译:利用外部评分模型对机器翻译系统生成的候选列表进行重排序,并返回最高得分的候选译文,这仍然是提升整体输出质量的简单有效方法。随着翻译评分模型规模的持续扩大,最优模型已可与生成模型相媲美,重排序过程因此可能为翻译流程带来显著的计算开销。本研究将重排序问题构建为贝叶斯优化问题。通过基于探索与利用的平衡策略性地选择待评分候选译文,我们证明仅需对候选列表中的部分样本进行评分即可找到最优候选。例如,当基线系统需进行180次评分计算时,本方法仅通过70次评分计算即可达到相同的CometKiwi分数。我们进一步提出了贝叶斯优化的多保真度设置方案:首先使用成本更低但噪声更强的代理评分模型对候选译文进行初步评分。当采用经过充分训练的小型蒸馏代理评分器时,该方法能进一步提升成本与性能的权衡效果。