Personalized Language Modeling from Personalized Human Feedback

Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be ineffective in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, including a user model that maps user information to user representations and can flexibly encode our assumptions on user preferences. We develop new learning objectives to perform personalized Direct Preference Optimization that jointly learns a user model and a personalized language model. We demonstrate the efficacy of our proposed method through (1) a synthetic task where we fine-tune a GPT-J 6B model to align with users with conflicting preferences on generation length; and (2) an instruction following task where we fine-tune a Tulu-7B model to generate responses for users with diverse preferences on the style of responses. In both cases, our learned models can generate personalized responses that are better aligned with the preferences of individual users.

翻译：基于人类反馈的强化学习（RLHF）通常用于微调大型语言模型，以更好地与人类偏好对齐。然而，当人类反馈中编码的用户偏好存在多样性时，该框架下开发算法的基本前提可能存在问题。在本工作中，我们旨在通过开发构建个性化语言模型的方法来解决这一问题。我们首先正式引入从个性化人类反馈中学习的任务，并解释为何原始RLHF在此情境下可能无效。随后，我们提出了一个通用的个性化RLHF（P-RLHF）框架，其中包括一个将用户信息映射到用户表示的用户模型，该模型能够灵活编码我们对用户偏好的假设。我们开发了新的学习目标，以执行个性化的直接偏好优化，从而联合学习用户模型和个性化语言模型。我们通过以下实验证明了所提方法的有效性：（1）在合成任务中，我们微调GPT-J 6B模型，使其与对生成长度有冲突偏好的用户对齐；（2）在指令跟随任务中，我们微调Tulu-7B模型，为对回答风格有不同偏好的用户生成响应。在这两种情况下，我们学习到的模型均能生成与个体用户偏好更好对齐的个性化响应。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日