One of the most important challenges in text generation systems is to produce outputs that are not only correct but also diverse. Recently, Minimum Bayes-Risk (MBR) decoding has gained prominence for generating sentences of the highest quality among the decoding algorithms. However, existing algorithms proposed for generating diverse outputs are predominantly based on beam search or random sampling, thus their output quality is capped by these underlying methods. In this paper, we investigate an alternative approach -- we develop diversity-promoting decoding algorithms by enforcing diversity objectives to MBR decoding. We propose two variants of MBR, Diverse MBR (DMBR) and $k$-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity. We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting. The experimental results show that the proposed method achieves a better trade-off than the diverse beam search and sampling algorithms.
翻译:文本生成系统面临的一项重要挑战是生成不仅正确而且多样的输出。近年来,最小贝叶斯风险(MBR)解码作为一种解码算法,在生成最高质量的句子方面日益受到关注。然而,现有用于生成多样性输出的算法主要基于波束搜索或随机采样,其输出质量受到这些基础方法的限制。本文探索了一种替代方案——通过将多样性目标纳入MBR解码,开发了促进多样性的解码算法。我们提出了MBR的两种变体:多样化MBR(DMBR)和k-中心点MBR(KMBR),用以生成兼具高质量与多样性的句子集合。我们使用编码器-解码器模型以及通过提示驱动的大语言模型,在多种定向文本生成任务上对DMBR和KMBR进行了评估。实验结果表明,所提出的方法相比于多样化波束搜索和采样算法,实现了更优的权衡。