We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.
翻译:我们证明,对比解码(Contrastive Decoding)——由Li等人于2022年提出的一种简单、计算量小且无需训练的文字生成方法——在多种推理任务上相较于贪婪解码取得了显著的开箱即用改进。对比解码最初被证明能够提升长文本生成的主观质量,其通过搜索使强模型与弱模型之间的加权似然差最大化的字符串来实现。我们展示,对比解码使LLaMA-65B在HellaSwag常识推理基准测试中优于LLaMA 2、GPT-3.5和PaLM 2-L;在GSM8K数学文字推理基准测试中优于LLaMA 2、GPT-3.5和PaLM-540B;同时还在其他一系列任务上有提升。分析表明,对比解码通过防止某些抽象推理错误,以及避免链式思考过程中复制输入片段等简单模式,从而优于现有方法。总体而言,对比解码在长文本生成任务上优于核采样,在推理任务上优于贪婪解码,成为从语言模型生成文本的一种强大通用方法。