ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots

Learning a locomotion policy for quadruped robots has traditionally been constrained to specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. These differences encompass a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 16 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.

翻译：传统上，四足机器人的运动策略学习受限于特定的机器人形态、质量和尺寸。对于每台新机器人，通常需要重复学习过程，重新调整超参数和奖励函数权重以最大化新系统的性能。另一种方法是尝试训练单一策略以适应不同尺寸的机器人，同时保持相同的自由度数和形态，但这需要复杂的学习框架，或对质量、惯性及尺寸进行随机化处理，从而延长训练周期。本研究表明，借鉴动物运动控制机制能够有效训练单一运动策略，控制多种不同类型的四足机器人。这些差异包括可变自由度数（即12或16个关节）、三种不同形态、2公斤至200公斤的广泛质量范围，以及16厘米至100厘米的标称站立高度。我们的策略通过调制脊髓中中央模式生成器的表征，有效协调CPG的频率和幅度以产生节律性输出（节律生成），随后映射至模式形成层。在机器人间唯一变化的组件为PF层，它调整步高和步长的缩放参数。最后，我们通过在Unitree Go1和A1机器人上测试单一策略来评估仿真到现实的迁移效果。值得注意的是，即使在添加相当于A1机器人标称质量125%的15公斤负载时，仍观察到稳健的性能表现。

相关内容

MASS

关注 0

MASS：IEEE International Conference on Mobile Ad-hoc and Sensor Systems。 Explanation：移动Ad hoc和传感器系统IEEE国际会议。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/mass/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日