Reinforcement learning (RL) algorithms face significant challenges when dealing with long-horizon robot manipulation tasks in real-world environments due to sample inefficiency and safety issues. To overcome these challenges, we propose a novel framework, SEED, which leverages two approaches: reinforcement learning from human feedback (RLHF) and primitive skill-based reinforcement learning. Both approaches are particularly effective in addressing sparse reward issues and the complexities involved in long-horizon tasks. By combining them, SEED reduces the human effort required in RLHF and increases safety in training robot manipulation with RL in real-world settings. Additionally, parameterized skills provide a clear view of the agent's high-level intentions, allowing humans to evaluate skill choices before they are executed. This feature makes the training process even safer and more efficient. To evaluate the performance of SEED, we conducted extensive experiments on five manipulation tasks with varying levels of complexity. Our results show that SEED significantly outperforms state-of-the-art RL algorithms in sample efficiency and safety. In addition, SEED also exhibits a substantial reduction of human effort compared to other RLHF methods. Further details and video results can be found at https://seediros23.github.io/.
翻译:强化学习(RL)算法在处理现实环境中长时间跨度的机器人操作任务时,面临样本效率低下和安全性问题等重大挑战。为克服这些挑战,我们提出了一种名为SEED的新框架,该框架融合了两种方法:基于人类反馈的强化学习(RLHF)和基于原始技能的强化学习。这两种方法均能有效应对稀疏奖励问题和长时间跨度任务中的复杂性。通过结合二者,SEED减少了RLHF中所需的人类努力,并提升了在现实环境中使用RL训练机器人操作的安全性。此外,参数化技能提供了对智能体高层次意图的清晰视图,使人类能够在技能执行前评估其选择。这一特性使得训练过程更加安全高效。为评估SEED的性能,我们在五个复杂度不同的操作任务上进行了广泛实验。结果表明,SEED在样本效率和安全性上显著优于当前最先进的RL算法。同时,与其他RLHF方法相比,SEED还大幅减少了人类劳动投入。更多细节和视频结果请访问 https://seediros23.github.io/ 。