Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR) decoding can be a powerful alternative to beam search decoding, especially when combined with neural-based utility functions. However, the performance of MBR decoding depends heavily on how and how many candidates are sampled from the model. In this paper, we explore how different sampling approaches for generating candidate lists for MBR decoding affect performance. We evaluate popular sampling approaches, such as ancestral, nucleus, and top-k sampling. Based on our insights into their limitations, we experiment with the recently proposed epsilon-sampling approach, which prunes away all tokens with a probability smaller than epsilon, ensuring that each token in a sample receives a fair probability mass. Through extensive human evaluations, we demonstrate that MBR decoding based on epsilon-sampling significantly outperforms not only beam search decoding, but also MBR decoding with all other tested sampling methods across four language pairs.
翻译:近期机器翻译领域的进展表明,最小贝叶斯风险(MBR)解码可作为波束搜索解码的有力替代方案,尤其在与基于神经网络的效用函数结合时表现突出。然而,MBR解码的性能高度依赖于从模型中抽取候选样本的方式与数量。本文系统探究了用于生成MBR解码候选列表的不同采样方法对性能的影响。我们评估了祖先采样、核采样和top-k采样等主流方法,并基于对其局限性的分析,进一步实验了近期提出的epsilon采样方法——该方法通过剪除概率小于epsilon的所有标记,确保样本中各标记获得公平的概率分配。通过大规模人工评测,我们证明基于epsilon采样的MBR解码不仅在四个语言对上全面超越波束搜索解码,更优于所有其他测试采样方法下的MBR解码方案。