Some Natural Language Generation (NLG) tasks require both faithfulness and diversity. The decoding strategy is intensively related to the quality of the generated text. Strategies such as beam search, greedy search, etc., perform with low diversity and high repetition. On the other hand, guided decoding, the solution towards diversity, may generate unfaithful expressions. To this end, this paper presents Information Filter upon Diversity-Improved Decoding (IFDID) to obtain the tradeoff between diversity and faithfulness. IFDID is a two-stage decoding strategy leveraging the proposed Enhance-Filter framework, which achieves the tradeoff by increasing the probabilities of some typical tokens being selected and subsequently filtering them by their information amount. To verify the effectiveness, we compare our method with other baselines on related CommonGEN, RocStories and AdGen benchmarks, which cover Chinese and English datasets. Our numerical experimental results and human evaluation outcomes verify the effectiveness of the proposed approach, as our approach achieves a 1.24 higher ROUGE score describing faithfulness as well as higher diversity represented by 62.5% higher upon Dist-2 than traditional approaches, demonstrating that IFDID is a novel SOTA decoding strategy for the tradeoff between diversity and faithfulness.
翻译:部分自然语言生成(NLG)任务同时要求生成文本的忠实性与多样性。解码策略与生成文本质量密切相关。波束搜索、贪婪搜索等策略存在多样性低、重复率高的问题;另一方面,旨在提升多样性的引导式解码可能产生不忠实的表达。为此,本文提出基于多样性改进解码的信息过滤方法(Information Filter upon Diversity-Improved Decoding, IFDID),用于实现多样性与忠实性的平衡。IFDID是一种两阶段解码策略,利用所提出的增强-过滤(Enhance-Filter)框架,通过提高部分典型标记被选中的概率,随后依据其信息量进行过滤来实现平衡。为验证有效性,我们在涉及中英文数据集的CommonGEN、RocStories及AdGen相关基准上,将本方法与多种基线方法进行对比。数值实验与人工评估结果均验证了所提方法的有效性:相较于传统方法,本方法在描述忠实性的ROUGE得分上提升1.24,同时在Dist-2指标上提升62.5%以表征更高多样性,表明IFDID是一种实现多样性与忠实性平衡的新型最优(SOTA)解码策略。