This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms. The proposed method trains an RL agent equipped with a memory unit to imitate reference motions using a small set of procedurally generated quadruped robots. Through comprehensive simulation and real-world hardware experiments, we demonstrate the efficacy of our approach in achieving locomotion across various robots without requiring robot-specific fine-tuning. Furthermore, we highlight the critical role of the memory unit in enabling generalization, facilitating rapid adaptation to changes in the robot properties, and improving sample efficiency.
翻译:本研究提出一种元强化学习方法,用于开发能够零样本泛化至多种四足机器人平台的通用运动控制策略。该方法通过使用少量程序化生成的四足机器人模型,训练配备记忆单元的强化学习智能体以模仿参考运动轨迹。通过全面的仿真与真实硬件实验,我们验证了该方法在不同机器人上实现运动控制的有效性,且无需针对特定机器人进行微调。此外,我们强调了记忆单元在实现泛化能力、快速适应机器人属性变化以及提升样本效率方面的关键作用。