While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.
翻译:尽管大型语言模型在广泛的下游应用中已展现出卓越效果,但其生成的文本往往存在不符合预期或缺乏特定属性的问题。本文提出奖励增强解码(Reward-Augmented Decoding, RAD)这一文本生成方法,该方法通过小型单向奖励模型引导语言模型生成具有指定属性的文本。具体而言,RAD在生成过程中实时利用奖励模型对已生成片段进行评分,并重新调整采样概率以优先选择高奖励词元。由于采用单向奖励模型,RAD可缓存先前生成步骤的激活值以降低计算开销。通过毒性文本规避与情感控制文本生成等实验,结果表明RAD在所有仅改变生成策略的方法中表现最优,其性能可与需要重新训练语言模型的当前最优方法相媲美。我们进一步验证了该方法在超大规模语言模型上的有效性,且仅引入极小的额外计算开销。