The current thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones to tackle the problem of learning in high-dimensional and complex environments. It addresses such goals by decomposing learning tasks in a hierarchical fashion known as Hierarchical Reinforcement Learning. We start in the first chapter by getting familiar with the Markov Decision Process framework and presenting some of its recent techniques that the following chapters use. We then proceed to build our Hierarchical Policy learning as an answer to the limitations of a single primitive policy. The hierarchy is composed of a manager agent at the top and employee agents at the lower level. In the last chapter, which is the core of this thesis, we attempt to learn lower-level elements of the hierarchy independently of the manager level in what is known as the "Eigenoption". Based on the graph structure of the environment, Eigenoptions allow us to build agents that are aware of the geometric and dynamic properties of the environment. Their decision-making has a special property: it is invariant to symmetric transformations of the environment, allowing as a consequence to greatly reduce the complexity of the learning task.
翻译:本论文旨在探索强化学习领域,并在现有方法的基础上构建改进方案,以解决高维复杂环境中的学习问题。通过将学习任务进行层次化分解(即分层强化学习)来实现这一目标。第一章首先介绍马尔可夫决策过程框架,并阐述后续章节将使用的若干最新技术。随后,我们构建分层策略学习以应对单一原始策略的局限性。该层次结构由顶层的管理智能体与底层的执行智能体构成。在作为论文核心的最后一章中,我们尝试以独立于管理层的方式学习层次结构的底层元素,即所谓的"特征选项"。基于环境的图结构,特征选项使我们能够构建感知环境几何特征与动态特性的智能体。其决策过程具有特殊性质:对环境对称变换具有不变性,从而显著降低学习任务的复杂性。