We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.
翻译:我们证明,Li等人(2022)提出的对比解码——一种简单、计算量轻且无需训练的文字生成方法——在多种推理任务上实现了相比贪心解码的显著开箱式改进。最初,该方法被证明能提升长文本生成的感知质量,其通过搜索最大化强模型与弱模型之间加权似然差异的字符串。我们展示,对比解码使LLaMA-65B在HellaSwag常识推理基准上超越LLaMA 2、GPT-3.5和PaLM 2-L;在GSM8K数学文字推理基准上超越LLaMA 2、GPT-3.5和PaLM-540B;此外还在其他多项任务上取得提升。分析表明,对比解码通过防止某些抽象推理错误以及避免链式思维过程中复制输入段落等简单模式,优于现有方法。总体而言,对比解码在长文本生成上优于核采样,在推理任务上优于贪心解码,是一种从语言模型生成文本的强大通用方法。