Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control. However, it poses a challenging multi-task reinforcement learning problem, as the optimal policy may be quite different across robots and critically depend on the morphology. Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies, but pay little attention to the dependency of a robot's control policy on its morphology context. In this paper, we propose a hierarchical architecture to better model this dependency via contextual modulation, which includes two key submodules: (1) Instead of enforcing hard parameter sharing across robots, we use hypernetworks to generate morphology-dependent control parameters; (2) We propose a fixed attention mechanism that solely depends on the morphology to modulate the interactions between different limbs in a robot. Experimental results show that our method not only improves learning performance on a diverse set of training robots, but also generalizes better to unseen morphologies in a zero-shot fashion.
翻译:在不同机器人形态间学习通用策略能够显著提升连续控制中的学习效率与泛化能力。然而,由于不同机器人的最优策略可能差异显著且高度依赖其形态特征,这构成了一个极具挑战性的多任务强化学习问题。现有方法采用图神经网络或Transformer架构处理异构状态与动作空间,但鲜有关注机器人控制策略对其形态上下文的依赖性。本文提出一种层级化架构,通过上下文调制更精确地建模这种依赖关系:核心包含两个子模块:(1) 我们采用超网络生成形态相关的控制参数,替代传统的跨机器人硬参数共享机制;(2) 提出一种仅依赖形态学特征的固定注意力机制,用于调节机器人不同肢体间的交互作用。实验结果表明,该方法不仅提升了对多样化训练机器人集的学习性能,还能以零样本方式更好地泛化至未见过的新形态。