Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.
翻译:诸如堆叠等长时程操作任务一直是机器人操作领域的一项长期挑战,尤其是在使用强化学习方法时,这些方法通常难以学习实现这些复杂目标的正确动作序列。为了学习该序列,符号规划方法基于高层推理提供了一种良好的解决方案,然而,规划器往往难以解决精确执行所需的低层控制细节。本文提出了一种新颖的框架,通过高层算子与低层策略的协作,将符号规划与分层强化学习相结合。我们的贡献在于将规划算子(例如前提条件和效果)集成到基于Scheduled Auxiliary Control(SAC-X)方法的分层强化学习算法中。我们开发了一种双重用途的高层算子,它既可用于整体规划,也可作为独立的、可复用的策略。我们的方法为长时程任务(例如堆叠立方体)提供了一种灵活的解决方案。实验结果表明,我们提出的方法在学习和执行整个堆叠序列时获得了平均97.2%的成功率,并且学习独立策略(例如到达98.9%、提起99.7%、堆叠85%等)的成功率也较高。使用我们提出的方法时,训练时间也减少了68%。