Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
翻译:开放式学习极大受益于符号方法在目标表示中的应用,因为这类方法能为高效可迁移学习提供知识结构化途径。然而,现有依赖符号推理的层级强化学习(HRL)方法往往受限于需要人工设计目标表示。自主发现符号化目标表示的挑战在于,这种表示必须保留环境动态等关键信息。本文提出一种通过涌现表示实现目标发现的发育机制,该机制将任务中具有相似作用的环境状态集合进行抽象(即分组)。我们引入一种封建式HRL算法,该算法可同时学习目标表示与层级策略。该算法利用神经网络符号化可达性分析来近似状态集间的转移关系,并优化目标表示。我们在复杂导航任务上评估了该方法,结果表明学得的表示具有可解释性、可迁移性,并能实现数据高效学习。