Minimum Bayes Risk (MBR) decoding has been shown to be a powerful alternative to beam search decoding in a variety of text generation tasks. MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probability model according to a given utility function. Since it is impractical to compute the expected risk exactly over all possible hypotheses, two approximations are commonly used in MBR. First, it integrates over a sampled set of hypotheses rather than over all possible hypotheses. Second, it estimates the probability of each hypothesis using a Monte Carlo estimator. While the first approximation is necessary to make it computationally feasible, the second is not essential since we typically have access to the model probability at inference time. We propose Model-Based MBR (MBMBR), a variant of MBR that uses the model probability itself as the estimate of the probability distribution instead of the Monte Carlo estimate. We show analytically and empirically that the model-based estimate is more promising than the Monte Carlo estimate in text generation tasks. Our experiments show that MBMBR outperforms MBR in several text generation tasks, both with encoder-decoder models and with large language models.
翻译:最小贝叶斯风险(MBR)解码已被证明是各类文本生成任务中波束搜索解码的有力替代方案。MBR解码从候选假设池中选择一个假设,该假设在给定效用函数下基于概率模型的期望风险最小。由于精确计算所有可能假设的期望风险不切实际,MBR通常采用两种近似方法:第一,对采样假设集而非全部可能假设进行积分;第二,使用蒙特卡洛估计器估计每个假设的概率。虽然第一种近似是保证计算可行性的必要条件,但第二种近似并非必需——因为在推理阶段我们通常能获取模型概率值。我们提出基于模型的MBR(MBMBR),这一MBR变体直接使用模型概率本身作为概率分布的估计量,替代蒙特卡洛估计。理论分析与实验证明,在文本生成任务中基于模型的估计比蒙特卡洛估计更具优势。实验结果表明,无论是在编码器-解码器模型还是大型语言模型中,MBMBR在多项文本生成任务上的表现均优于标准MBR。