Reward Modeling with Weak Supervision for Language Models

Recent advancements in large language models (LLMs) have led to their increased application across various tasks, with reinforcement learning from human feedback (RLHF) being a crucial part of their training to align responses with user intentions. In the RLHF process, a reward model is trained using responses preferences determined by human labelers or AI systems, which then refines the LLM through reinforcement learning. This work introduces weak supervision as a strategy to extend RLHF datasets and enhance reward model performance. Weak supervision employs noisy or imprecise data labeling, reducing reliance on expensive manually labeled data. By analyzing RLHF datasets to identify heuristics that correlate with response preference, we wrote simple labeling functions and then calibrated a label model to weakly annotate unlabeled data. Our evaluation show that while weak supervision significantly benefits smaller datasets by improving reward model performance, its effectiveness decreases with larger, originally labeled datasets. Additionally, using an LLM to generate and then weakly label responses offers a promising method for extending preference data.

翻译：近年来，大型语言模型（LLMs）取得了显著进展，其应用范围日益广泛。为了使模型响应与用户意图对齐，基于人类反馈的强化学习（RLHF）已成为其训练过程中的关键环节。在RLHF流程中，奖励模型通过人类标注员或AI系统判定的响应偏好进行训练，随后通过强化学习对LLM进行微调。本研究引入弱监督作为一种扩展RLHF数据集并提升奖励模型性能的策略。弱监督采用带噪声或不精确的数据标注方式，从而降低对昂贵人工标注数据的依赖。通过分析RLHF数据集以识别与响应偏好相关的启发式规则，我们编写了简单的标注函数，进而校准标签模型以对未标注数据进行弱标注。评估结果表明，弱监督能通过提升奖励模型性能显著惠及较小规模的数据集，但其有效性随原始标注数据集规模的扩大而减弱。此外，利用LLM生成响应并进行弱标注，为扩展偏好数据提供了一种前景广阔的方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日