To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/.
翻译:为了在现实应用中利用基于人类反馈的强化学习(RLHF),关键在于从多种来源的人类反馈中学习奖励模型,并考虑提供不同类型反馈时所涉及的人为因素。然而,由于现有标准化工具对研究人员的限制,系统性地研究从多样化反馈类型中学习的过程受到了阻碍。为弥补这一空白,我们提出RLHF-Blender——一种用于从人类反馈中学习的可配置、交互式界面。RLHF-Blender提供了模块化的实验框架与实现,使研究人员能够系统性地探究人类反馈在奖励学习中的属性与质量。该系统支持探索多种反馈类型,包括示范、排序、比较以及自然语言指令,同时还能开展关于人为因素对其有效性影响的研究。我们讨论了由RLHF-Blender所促成的一系列具体研究机遇。更多信息请访问 https://rlhfblender.info/。