Hierarchical reinforcement learning is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we propose a novel hierarchical algorithm by generating a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. The lower level primitive periodically performs data relabeling on a handful of expert demonstrations using our primitive informed parsing approach. We provide expressions to bound the sub-optimality of our method and develop a practical algorithm for hierarchical reinforcement learning. Since our approach uses a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluation on complex maze navigation and robotic manipulation environments show that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks.
翻译:分层强化学习是一种基于时间抽象解决复杂长时域问题的有效范式。然而,由于底层基元策略的非平稳性导致高层策略训练困难,同时学习策略层次结构存在不稳定性。本文提出一种新颖的分层算法,通过强化学习与模仿学习生成面向动态演化底层基元的可达子目标课程。底层基元周期性地利用本文提出的基元感知解析方法,对少量专家演示数据进行数据重标定。我们给出方法次优性的理论界表达式,并开发了实用的分层强化学习算法。由于该方法仅需少量专家演示数据,适用于大多数机器人控制任务。在复杂迷宫导航与机器人操作环境中的实验表明,引入分层课程学习能显著提升样本效率,并生成高效的面向时域扩展任务的目标条件策略。