Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
翻译:开放式的学习通过使用符号化方法表示目标而受益匪浅,因为这些方法能够组织知识以实现高效且可迁移的学习。然而,现有依赖符号推理的分层强化学习方法通常受到限制,因为它们需要手动进行目标表示。自主发现符号化目标表示的挑战在于,它必须保留关键信息,例如环境动态特性。在本文中,我们提出了一种通过涌现式表示实现目标发现的发育机制,该机制抽象(即分组处理)了在任务中具有相似作用的环境状态集合。我们引入了一种封建式分层强化学习算法,该算法同时学习目标表示和分层策略。该算法利用神经网络的符号化可达性分析来逼近状态集合之间的转移关系,并优化目标表示。我们在复杂的导航任务上评估了该方法,结果表明学习的表示具有可解释性、可迁移性,并能实现数据高效的学习。