DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

The performance of the reward model (RM) is a critical factor in improving the effectiveness of the large language model (LLM) during alignment fine-tuning. There remain two challenges in RM training: 1) training the same RM using various categories of data may cause its generalization performance to suffer from multi-task disturbance, and 2) the human annotation consistency rate is generally only $60\%$ to $75\%$, causing training data to contain a lot of noise. To tackle these two challenges, we introduced the idea of Mixture-of-Experts (MoE) into the field of RM for the first time. We propose the Double-Layer MoE RM (DMoERM). The outer layer MoE is a sparse model. After classifying an input into task categories, we route it to the corresponding inner layer task-specific model. The inner layer MoE is a dense model. We decompose the specific task into multiple capability dimensions and individually fine-tune a LoRA expert on each one. Their outputs are then synthesized by an MLP to compute the final rewards. To minimize costs, we call a public LLM API to obtain the capability preference labels. The validation on manually labeled datasets confirms that our model attains superior consistency with human preference and outstrips advanced generative approaches. Meanwhile, through BoN sampling and RL experiments, we demonstrate that our model outperforms state-of-the-art ensemble methods of RM and mitigates the overoptimization problem. Our code and dataset are available at: https://github.com/quanshr/DMoERM-v1.

翻译：奖励模型（RM）的性能是在对齐微调过程中提升大语言模型（LLM）效果的关键因素。目前RM训练面临两大挑战：1）使用不同类型数据训练同一RM可能导致其泛化性能受多任务干扰；2）人工标注一致率通常仅为60%至75%，导致训练数据包含大量噪声。针对这两个挑战，我们首次将混合专家（MoE）思想引入RM领域，提出双层MoE RM（DMoERM）。外层MoE为稀疏模型：将输入分类至任务类别后路由至对应内层任务专用模型。内层MoE为密集模型：将特定任务分解为多个能力维度，对每个维度独立微调LoRA专家，并通过MLP合成其输出以计算最终奖励。为降低成本，我们调用公开LLM应用程序编程接口（API）获取能力偏好标签。人工标注数据集验证表明，本模型在人工偏好一致性上表现优异，超越先进生成式方法。同时，通过胜率（BoN）采样和强化学习（RL）实验，我们证明本模型优于现有最优RM集成方法，并缓解了过优化问题。代码与数据集已开源：https://github.com/quanshr/DMoERM-v1

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日