Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment. In particular, we use unconstrained natural language to parameterize this space. This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks. Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space. Our work presents a novel approach to combining human expert supervision with the benefits and flexibility of reinforcement learning.
翻译:分层强化学习一直是实现长序列动作中目标导向行为的引人注目的方法。然而,在现实或开放式环境中实现这一方法颇具挑战性。其中主要挑战在于找到合适的子目标空间以实例化层次结构。我们提出了一种新颖方法,利用人类解决这些任务的数据,对3D具身环境中一组长程任务的子目标空间进行软监督。具体而言,我们使用无约束的自然语言对子目标空间进行参数化。这具有两个优势:首先,易于从没有经验的参与者中生成此类数据;其次,该空间足够灵活,能够表示与人类相关的任务中的广泛子目标。我们的方法在任务表现上优于克隆专家行为的智能体,也优于未使用这种监督子目标空间的从头训练式分层强化学习。本研究提出了一种将人类专家监督与强化学习的优势及灵活性相结合的新颖方法。