Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of tasks without any additional data or training. Despite this, MBR is not frequently applied in NLP works, and knowledge of the method itself is limited. We first provide an introduction to the method and the recent literature. We show that several recent methods that do not reference MBR can be written as special cases of MBR; this reformulation provides additional theoretical justification for the performance of these methods, explaining some results that were previously only empirical. We provide theoretical and empirical results about the effectiveness of various MBR variants and make concrete recommendations for the application of MBR in NLP models, including future directions in this area.
翻译:最小贝叶斯风险(MBR)解码是一种选择机器学习系统输出的方法,其依据并非最高概率的输出,而是在多个候选输出中具有最低风险(期望误差)的输出。这是一种简单而强大的方法:在推理阶段增加少量计算成本的情况下,MBR能在多种任务中无需额外数据或训练,即可在各项指标上稳定提升若干百分点。尽管如此,MBR在自然语言处理(NLP)研究中应用并不频繁,且对该方法本身的认知较为有限。我们首先介绍了该方法及近期相关文献。研究表明,多个未提及MBR的近期方法可被视作MBR的特例;这一重构为这些方法的性能提供了额外的理论依据,解释了此前仅凭经验得出的某些结论。我们提供了关于各类MBR变体有效性的理论与实证结果,并就MBR在NLP模型中的应用(包括该领域的未来方向)提出了具体建议。