Goal representation affects the performance of Hierarchical Reinforcement Learning (HRL) algorithms by decomposing the complex learning problem into easier subtasks. Recent studies show that representations that preserve temporally abstract environment dynamics are successful in solving difficult problems and provide theoretical guarantees for optimality. These methods however cannot scale to tasks where environment dynamics increase in complexity i.e. the temporally abstract transition relations depend on larger number of variables. On the other hand, other efforts have tried to use spatial abstraction to mitigate the previous issues. Their limitations include scalability to high dimensional environments and dependency on prior knowledge. In this paper, we propose a novel three-layer HRL algorithm that introduces, at different levels of the hierarchy, both a spatial and a temporal goal abstraction. We provide a theoretical study of the regret bounds of the learned policies. We evaluate the approach on complex continuous control tasks, demonstrating the effectiveness of spatial and temporal abstractions learned by this approach.
翻译:目标表示通过将复杂学习问题分解为更简单的子任务,影响分层强化学习算法的性能。最新研究表明,保留时间抽象环境动态特性的表示方法在解决困难问题方面表现优异,并提供了最优性的理论保证。然而,当环境动态复杂性增加(即时间抽象转移关系依赖更多变量)时,这些方法无法扩展。另一方面,其他研究尝试利用空间抽象来缓解上述问题,但其局限性包括对高维环境的可扩展性不足以及依赖先验知识。本文提出一种新颖的三层分层强化学习算法,在层级的不同水平上同时引入空间和时间目标抽象。我们提供了所学策略遗憾界的理论研究,并通过复杂连续控制任务验证了该算法所习得空间与时间抽象的有效性。