Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control. However, it poses a challenging multi-task reinforcement learning problem, as the optimal policy may be quite different across robots and critically depend on the morphology. Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies, but pay little attention to the dependency of a robot's control policy on its morphology context. In this paper, we propose a hierarchical architecture to better model this dependency via contextual modulation, which includes two key submodules: (1) Instead of enforcing hard parameter sharing across robots, we use hypernetworks to generate morphology-dependent control parameters; (2) We propose a morphology-dependent attention mechanism to modulate the interactions between different limbs in a robot. Experimental results show that our method not only improves learning performance on a diverse set of training robots, but also generalizes better to unseen morphologies in a zero-shot fashion.
翻译:学习跨不同机器人形态的通用策略能显著提升连续控制中的学习效率与泛化能力。然而,由于不同机器人最优策略可能差异显著且关键依赖于形态特征,这构成了一个具有挑战性的多任务强化学习问题。现有方法采用图神经网络或transformer处理异质状态与动作空间,但鲜有关注机器人控制策略对其形态情境的依赖性。本文提出一种层次化架构,通过情境调制更好地建模这种依赖性,包含两个关键子模块:(1)摒弃跨机器人的硬参数共享机制,采用超网络生成依赖形态的控制参数;(2)提出形态依赖注意力机制,调控机器人不同肢体间的交互。实验表明,本方法不仅提升了多样训练机器人集合上的学习性能,还能以零样本方式更好地泛化至未见形态。