Reinforcement learning (RL) has achieved remarkable success in complex robotic systems (eg. quadruped locomotion). In previous works, the RL-based controller was typically implemented as a single neural network with concatenated observation input. However, the corresponding learned policy is highly task-specific. Since all motors are controlled in a centralized way, out-of-distribution local observations can impact global motors through the single coupled neural network policy. In contrast, animals and humans can control their limbs separately. Inspired by this biological phenomenon, we propose a Decentralized motor skill (DEMOS) learning algorithm to automatically discover motor groups that can be decoupled from each other while preserving essential connections and then learn a decentralized motor control policy. Our method improves the robustness and generalization of the policy without sacrificing performance. Experiments on quadruped and humanoid robots demonstrate that the learned policy is robust against local motor malfunctions and can be transferred to new tasks.
翻译:强化学习(RL)已在复杂机器人系统(如四足 locomotion)中取得了显著成功。以往的研究中,基于RL的控制器通常被实现为具有拼接观测输入的单一种经网络。然而,相应的学习策略具有高度任务特异性。由于所有电机均以中心化方式控制,分布外局部观测会通过单一耦合神经网络策略影响全局电机。相比之下,动物和人类能独立控制其四肢。受这一生物现象启发,我们提出了一种去中心化运动技能(DEMOS)学习算法,该算法能自动发现彼此可解耦同时保留必要连接的运动群组,进而学习一种去中心化的电机控制策略。我们的方法在不牺牲性能的前提下提升了策略的鲁棒性和泛化能力。在四足机器人和人形机器人上的实验表明,学习到的策略对局部电机故障具有鲁棒性,并可迁移至新任务。