Deploying Offline Reinforcement Learning with Human Feedback

Reinforcement learning (RL) has shown promise for decision-making tasks in real-world applications. One practical framework involves training parameterized policy models from an offline dataset and subsequently deploying them in an online environment. However, this approach can be risky since the offline training may not be perfect, leading to poor performance of the RL models that may take dangerous actions. To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase. We formalize this online deployment problem and develop two approaches. The first approach uses model selection and the upper confidence bound algorithm to adaptively select a model to deploy from a candidate set of trained offline RL models. The second approach involves fine-tuning the model in the online deployment phase when a supervision signal arrives. We demonstrate the effectiveness of these approaches for robot locomotion control and traffic light control tasks through empirical validation.

翻译：强化学习（RL）在现实应用中的决策任务中展现出巨大潜力。一种实际框架涉及从离线数据集中训练参数化策略模型，随后将其部署到在线环境中。然而，由于离线训练可能存在不完美之处，导致RL模型表现欠佳并可能采取危险行动，因此这种方法具有风险性。为解决此问题，我们提出一种替代框架：由人类监督RL模型，并在在线部署阶段提供额外反馈。我们形式化了这一在线部署问题，并开发了两种方法。第一种方法利用模型选择与上置信界算法，从已训练的离线RL模型候选集中自适应选择模型进行部署。第二种方法则在在线部署阶段当监督信号到达时对模型进行微调。通过机器人运动控制与交通信号灯控制任务的实证验证，我们展示了这些方法的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

148页最新《深度强化学习》教程，148页ppt

专知会员服务

77+阅读 · 2023年4月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

86+阅读 · 2020年2月18日