Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these methods encounter limitations when dealing with diverse constraints in complex environments, such as safety constraints. Some of these approaches prioritize goal attainment without considering safety, while others excessively focus on safety at the expense of training efficiency. In this paper, we study the problem of constrained offline GCRL and propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals. To evaluate the method performance, we build a benchmark based on the robot-fetching environment with a randomly positioned obstacle and use expert or random policies to generate an offline dataset. We compare RbSL with three offline GCRL algorithms and one offline safe RL algorithm. As a result, our method outperforms the existing state-of-the-art methods to a large extent. Furthermore, we validate the practicality and effectiveness of RbSL by deploying it on a real Panda manipulator. Code is available at https://github.com/Sunlighted/RbSL.git.
翻译:离线目标条件强化学习(GCRL)旨在利用离线数据集中的稀疏奖励解决目标到达任务。尽管已有研究展示了智能体学习近最优策略的各种方法,但这些方法在应对复杂环境中的多样性约束(如安全约束)时存在局限性。部分方法优先考虑目标达成而忽略安全性,另一些则过度关注安全而牺牲训练效率。本文研究了带约束的离线GCRL问题,提出了基于恢复的监督学习(RbSL)方法,用于完成具有多样目标的安全关键任务。为评估方法性能,我们基于随机放置障碍物的机器人抓取环境构建了基准测试,并采用专家策略或随机策略生成离线数据集。我们将RbSL与三种离线GCRL算法及一种离线安全强化学习算法进行比较。结果表明,我们的方法在较大程度上优于现有最先进方法。此外,通过在实际Panda机械臂上的部署,验证了RbSL的实用性和有效性。代码已开源至https://github.com/Sunlighted/RbSL.git。