Language models trained on large amounts of data require careful tuning to be safely deployed in real world. We revisit the guided decoding paradigm, where the goal is to augment the logits of the base language model using the scores from a task-specific reward model. We propose a simple but efficient parameterization of the autoregressive reward model enabling fast and effective guided decoding. On detoxification and sentiment control tasks, we show that our efficient parameterization performs on par with RAD, a strong but less efficient guided decoding approach.
翻译:在大量数据上训练的语言模型需要经过精细调优才能安全部署于现实世界。我们重新审视了引导解码范式,其目标是通过任务特定奖励模型的评分来增强基础语言模型的逻辑值。我们提出了一种简单而高效的自回归奖励模型参数化方法,能够实现快速有效的引导解码。在去毒性和情感控制任务中,我们证明这种高效参数化方法的性能与RAD(一种强大但效率较低的引导解码方法)相当。