While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.
翻译:尽管大语言模型在众多下游应用中展现出卓越性能,但其生成的文本常存在缺陷或缺乏所需属性。本文提出奖励增强解码(RAD)文本生成方法,通过轻量级单向奖励模型引导语言模型生成具有特定属性的文本。具体而言,RAD在生成过程中对已生成的序列进行奖励评分,并重设采样概率以优先选择高奖励token。通过采用单向奖励模型,RAD可缓存先前生成步骤的激活值,从而降低计算开销。在无毒性文本生成和情感可控文本生成实验中的结果表明,在仅改变生成策略的方法中RAD表现最优,其性能与需要重新训练语言模型的最先进方法相当。实验进一步验证了RAD在超大规模语言模型上的有效性,且引入的计算开销极低。