Hierarchical reinforcement learning (HRL) is hypothesized to be able to leverage the inherent hierarchy in learning tasks where traditional reinforcement learning (RL) often fails. In this research, HRL is evaluated and contrasted with traditional RL in complex robotic navigation tasks. We evaluate unique characteristics of HRL, including its ability to create sub-goals and the termination functions. We constructed a number of experiments to test: 1) the differences between RL proximal policy optimization (PPO) and HRL, 2) different ways of creating sub-goals in HRL, 3) manual vs automatic sub-goal creation in HRL, and 4) the effects of the frequency of termination on performance in HRL. These experiments highlight the advantages of HRL over RL and how it achieves these advantages.
翻译:分层强化学习(HRL)被认为能够利用学习任务中固有的层次结构,而传统强化学习(RL)在此类任务中常常失败。本研究在复杂的机器人导航任务中评估并对比了HRL与传统RL。我们评估了HRL的独特特性,包括其创建子目标的能力以及终止函数。我们构建了一系列实验来测试:1)RL近端策略优化(PPO)与HRL之间的差异;2)HRL中创建子目标的不同方式;3)HRL中手动与自动创建子目标的对比;4)终止频率对HRL性能的影响。这些实验凸显了HRL相较于RL的优势及其实现这些优势的机制。