General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
翻译:通用型智能体将需要庞大的技能库。赋权——技能与状态之间的最大互信息——为学习大量不同技能提供了途径,但互信息的优化较为困难。我们提出了一种新框架——分层赋权,通过整合目标条件化分层强化学习的概念,使赋权计算更具可处理性。该框架有两项具体贡献:首先,我们提出了一个新的互信息变分下界,可用于在短时域上计算赋权;其次,我们引入了一种分层架构,用于在指数级更长的时间尺度上计算赋权。通过一系列仿真机器人任务,我们验证了该框架的贡献。在流行的蚂蚁导航领域中,我们的四层智能体能够学习覆盖表面积比先前工作大两个数量级以上的技能。