Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse. Recent successes with HRL across different domains provide evidence that practical, effective HRL agents are possible, even if existing agents do not yet fully realize the potential of HRL. Despite these successes, visually complex partially observable 3D environments remained a challenge for HRL agents. We address this issue with Hierarchical Hybrid Offline-Online (H2O2), a hierarchical deep reinforcement learning agent that discovers and learns to use options from scratch using its own experience. We show that H2O2 is competitive with a strong non-hierarchical Muesli baseline in the DeepMind Hard Eight tasks and we shed new light on the problem of learning hierarchical agents in complex environments. Our empirical study of H2O2 reveals previously unnoticed practical challenges and brings new perspective to the current understanding of hierarchical agents in complex domains.
翻译:分层强化学习(Hierarchical Reinforcement Learning, HRL)智能体具有展现抽象规划与探索、迁移学习及技能复用等吸引人能力的潜力。近期HRL在不同领域的成功应用表明,尽管现有智能体尚未完全实现HRL的潜力,但构建实用有效的HRL智能体是可行的。然而,即使取得这些进展,视觉复杂的部分可观测三维环境对HRL智能体而言仍是一个挑战。我们通过提出层级混合离线-在线(Hierarchical Hybrid Offline-Online, H2O2)算法解决该问题——这是一种能基于自身经验从零开始发现并学习使用选项的深度分层强化学习智能体。实验表明,在DeepMind Hard Eight任务中,H2O2与强大的非分层Muesli基线方法性能相当,同时揭示了复杂环境中分层智能体学习的新问题。对H2O2的实证研究发现了先前未被注意的实际挑战,并为当前对复杂领域分层智能体的认知提供了新视角。