General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
翻译:通用型智能体需要具备庞大的技能库。赋权——即技能与状态之间的最大互信息——为学习大量不同技能提供了途径,但互信息的优化非常困难。我们提出了一个新框架——层级赋权,通过整合目标条件分层强化学习中的概念,使赋权计算更易处理。该框架有两项具体贡献:首先,我们提出了一个新的互信息变分下界,可用于计算短时间尺度上的赋权;其次,我们引入了一种分层架构,用于计算指数级更长时间尺度上的赋权。我们通过一系列仿真机器人任务验证了该框架的贡献。在流行的蚂蚁导航域中,我们的四层智能体能够学习覆盖表面积的技能,其范围比先前工作大两个数量级以上。