RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially significant repercussions. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) as a means of addressing this problem, wherein generative models are fine-tuned using RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment of generative models, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models more effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently assembles a streaming dataset. This dataset serves as the basis for aligning the generative model and can be employed under both offline and online settings. Notably, the sample generation process within RAFT is gradient-free, rendering it compatible with black-box generators. Through extensive experiments, we demonstrate that our proposed algorithm exhibits strong performance in the context of both large language models and diffusion models.

翻译：生成式基础模型容易受到来自大规模无监督训练数据中隐含偏差的影响。这些偏差可能导致生成次优样本、产生偏斜结果并引发不公平现象，甚至带来严重后果。因此，使这些模型与人类伦理和偏好对齐，是确保其在现实应用中负责任且有效部署的关键步骤。以往研究主要采用基于人类反馈的强化学习（RLHF）方法解决该问题，即通过人类反馈训练的奖励模型引导强化学习算法对生成模型进行微调。然而，强化学习算法固有的低效性和不稳定性往往为生成模型的对齐过程带来显著障碍，亟需开发更鲁棒且高效的方案。为此，我们提出名为"奖励排序微调"（RAFT）的新框架，旨在更有效地实现生成模型对齐。该方法利用奖励模型与充足样本，筛选出高质量样本并剔除不符合期望行为的样本，进而构建流式数据集。该数据集作为生成模型对齐的基础，可适用于离线与在线两种场景。值得注意的是，RAFT中的样本生成过程无需梯度计算，因此可兼容黑盒生成器。通过大量实验证明，本文算法在大语言模型和扩散模型场景下均展现出卓越性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日