RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) to address this problem, where generative models are fine-tuned with RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples. Our studies show that RAFT can effectively improve the model performance in both reward learning and other automated metrics in both large language models and diffusion models.

翻译：生成式基础模型容易受到来自大量无监督训练数据中隐含偏见的影响。这些偏见可能导致采样质量不佳、结果偏差以及不公平性问题，甚至可能带来严重后果。因此，将这些模型与人类伦理和偏好对齐，是确保其在真实世界应用中负责任且有效部署的关键步骤。先前的研究主要通过使用基于人类反馈的强化学习（Reinforcement Learning from Human Feedback, RLHF）来解决这一问题，即利用由人类反馈驱动的奖励模型指导强化学习算法对生成模型进行微调。然而，强化学习算法固有的低效性和不稳定性经常为成功对齐带来重大障碍，因此需要开发更稳健且更高效的方案。为此，我们提出了一种新框架——奖励排序微调（Reward rAnked FineTuning, RAFT），旨在有效对齐生成模型。该方法利用奖励模型和充足数量的样本，筛选高质量样本、丢弃表现出不良行为的样本，并通过对这些过滤后的样本进行微调来提升模型性能。我们的研究表明，在大语言模型和扩散模型中，RAFT能有效提升奖励学习及其他自动化评估指标上的模型表现。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/