Goal representation affects the performance of Hierarchical Reinforcement Learning (HRL) algorithms by decomposing the complex learning problem into easier subtasks. Recent studies show that representations that preserve temporally abstract environment dynamics are successful in solving difficult problems and provide theoretical guarantees for optimality. These methods however cannot scale to tasks where environment dynamics increase in complexity i.e. the temporally abstract transition relations depend on larger number of variables. On the other hand, other efforts have tried to use spatial abstraction to mitigate the previous issues. Their limitations include scalability to high dimensional environments and dependency on prior knowledge. In this paper, we propose a novel three-layer HRL algorithm that introduces, at different levels of the hierarchy, both a spatial and a temporal goal abstraction. We provide a theoretical study of the regret bounds of the learned policies. We evaluate the approach on complex continuous control tasks, demonstrating the effectiveness of spatial and temporal abstractions learned by this approach. Find open-source code at https://github.com/cosynus-lix/STAR.
翻译:目标表示通过将复杂学习问题分解为更易处理的子任务,从而影响分层强化学习(HRL)算法的性能。近期研究表明,保持时间抽象环境动态的表示方法能够成功解决困难问题,并为最优性提供理论保证。然而,这些方法难以扩展到环境动态复杂度增加的任务中,即时间抽象转移关系依赖于更多变量的情况。另一方面,其他研究尝试利用空间抽象来缓解上述问题,但其局限性包括对高维环境的可扩展性不足以及对先验知识的依赖。本文提出一种新颖的三层HRL算法,通过在层次结构的不同层级分别引入空间与时间目标抽象。我们对所学策略的遗憾界进行了理论分析,并在复杂连续控制任务上评估了该方法的有效性,证明了本方法所学习的时空抽象具有显著优势。开源代码位于 https://github.com/cosynus-lix/STAR。