Learning a shared policy that guides the locomotion of different agents is of core interest in Reinforcement Learning (RL), which leads to the study of morphology-agnostic RL. However, existing benchmarks are highly restrictive in the choice of starting point and target point, constraining the movement of the agents within 2D space. In this work, we propose a novel setup for morphology-agnostic RL, dubbed Subequivariant Graph RL in 3D environments (3D-SGRL). Specifically, we first introduce a new set of more practical yet challenging benchmarks in 3D space that allows the agent to have full Degree-of-Freedoms to explore in arbitrary directions starting from arbitrary configurations. Moreover, to optimize the policy over the enlarged state-action space, we propose to inject geometric symmetry, i.e., subequivariance, into the modeling of the policy and Q-function such that the policy can generalize to all directions, improving exploration efficiency. This goal is achieved by a novel SubEquivariant Transformer (SET) that permits expressive message exchange. Finally, we evaluate the proposed method on the proposed benchmarks, where our method consistently and significantly outperforms existing approaches on single-task, multi-task, and zero-shot generalization scenarios. Extensive ablations are also conducted to verify our design. Code and videos are available on our project page: https://alpc91.github.io/SGRL/.
翻译:在强化学习(RL)中,学习一种能够引导不同智能体运动的共享策略是核心研究兴趣,由此催生了形态无关强化学习。然而,现有基准在起点和目标点的选择上具有高度限制性,将智能体的运动约束在二维空间内。本文针对形态无关RL提出了一种新框架,称为3D环境中的子等变图强化学习(3D-SGRL)。具体而言,我们首先在3D空间中引入一组更实用且具挑战性的新基准,允许智能体从任意初始配置出发,以完全自由度在任意方向上探索。此外,为在增大的状态-动作空间上优化策略,我们提出将几何对称性(即子等变性)注入策略和Q函数的建模中,使策略能够泛化至所有方向,从而提升探索效率。这一目标通过一种新型子等变Transformer(SET)实现,该模型支持富有表现力的信息传递。最后,我们在所提出的基准上评估了该方法,在单任务、多任务及零样本泛化场景中,我们的方法始终且显著优于现有方法。同时进行了大量消融实验以验证我们的设计。代码和视频可在项目主页获取:https://alpc91.github.io/SGRL/。