Minimum Bayes risk (MBR) decoding outputs the hypothesis with the highest expected utility over the model distribution for some utility function. It has been shown to improve accuracy over beam search in conditional language generation problems and especially neural machine translation, in both human and automatic evaluations. However, the standard sampling-based algorithm for MBR is substantially more computationally expensive than beam search, requiring a large number of samples as well as a quadratic number of calls to the utility function, limiting its applicability. We describe an algorithm for MBR which gradually grows the number of samples used to estimate the utility while pruning hypotheses that are unlikely to have the highest utility according to confidence estimates obtained with bootstrap sampling. Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR while being statistically indistinguishable in terms of accuracy. We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
翻译:最小贝叶斯风险(MBR)解码在模型分布上输出针对某个效用函数具有最高期望效用的假设。研究表明,在条件语言生成问题(尤其是神经机器翻译)中,该方法在人工和自动评估中均能比束搜索提升准确性。然而,标准基于采样的MBR算法计算开销远高于束搜索,需要大量样本及对效用函数的二次调用次数,限制了其适用性。我们提出一种MBR算法:通过逐步增加用于估计效用的样本数量,同时根据自助采样获得的置信度估计,剪枝那些不太可能具有最高效用的假设。与标准MBR相比,该方法所需样本更少,且大幅减少了对效用函数的调用次数,同时在统计精度上与标准方法无显著差异。我们在三个语言对的实验中,使用chrF++和COMET作为效用/评估指标,验证了该方法的有效性。