We focus on learning composable policies to control a variety of physical agents with possibly different structures. Among state-of-the-art methods, prominent approaches exploit graph-based representations and weight-sharing modular policies based on the message-passing framework. However, as shown by recent literature, message passing can create bottlenecks in information propagation and hinder global coordination. This drawback can become even more problematic in tasks where high-level planning is crucial. In fact, in similar scenarios, each modular policy - e.g., controlling a joint of a robot - would request to coordinate not only for basic locomotion but also achieve high-level goals, such as navigating a maze. A classical solution to avoid similar pitfalls is to resort to hierarchical decision-making. In this work, we adopt the Feudal Reinforcement Learning paradigm to develop agents where control actions are the outcome of a hierarchical (pyramidal) message-passing process. In the proposed Feudal Graph Reinforcement Learning (FGRL) framework, high-level decisions at the top level of the hierarchy are propagated through a layered graph representing a hierarchy of policies. Lower layers mimic the morphology of the physical system and upper layers can capture more abstract sub-modules. The purpose of this preliminary work is to formalize the framework and provide proof-of-concept experiments on benchmark environments (MuJoCo locomotion tasks). Empirical evaluation shows promising results on both standard benchmarks and zero-shot transfer learning settings.
翻译:我们聚焦于学习可组合的策略,以控制多种可能具有不同结构的物理智能体。在现有最先进方法中,突出的方法利用基于图的表示和基于消息传递框架的权重共享模块化策略。然而,正如近期文献所示,消息传递可能在信息传播中造成瓶颈,并阻碍全局协调。这一缺陷在高层次规划至关重要的任务中可能变得更加严重。实际上,在类似场景中,每个模块化策略(例如控制机器人的一个关节)不仅需要协调基本运动,还需实现高级目标(如穿越迷宫)。避免此类陷阱的经典解决方法是采用分层决策。本文采用封建强化学习范式开发智能体,其中控制动作是分层(金字塔式)消息传递过程的结果。在所提出的封建图强化学习框架中,层级顶端的最高层决策通过代表策略层次结构的分层图进行传播。较低层模仿物理系统的形态,较高层可捕获更抽象的子系统模块。本项初步研究旨在形式化该框架,并在基准环境(MuJoCo运动任务)中提供概念验证实验。实证评估在标准基准和零样本迁移学习设置中均显示出有前景的结果。