Thanks to the explosive developments of data-driven learning methodologies recently, reinforcement learning (RL) emerges as a promising solution to address the legged locomotion problem in robotics. In this manuscript, we propose a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. Different from convectional teacher-student architecture that trains the teacher policy via RL and transfers the knowledge to the student policy through supervised learning, our proposed architecture trains teacher and student policy networks concurrently under the reinforcement learning paradigm. To achieve this, we develop a new training scheme based on conventional proximal policy gradient (PPO) method to accommodate the interaction between teacher policy network and student policy network. The effectiveness of the proposed architecture as well as the new training scheme is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and point-foot bipedal robot, showcasing robust locomotion over challenging terrains and improved performance compared to two-stage training methods.
翻译:近年来,数据驱动学习方法的爆炸式发展使得强化学习成为解决机器人四足运动问题的一种有前景的方案。本文提出了一种新颖的并行的教师-学生强化学习架构,用于在复杂地形上的四足运动,该架构仅依赖实际部署中的本体感觉测量。与传统教师-学生架构(通过强化学习训练教师策略,并通过监督学习将知识迁移至学生策略)不同,我们提出的架构在强化学习范式下并行训练教师策略网络和学生策略网络。为实现这一目标,我们基于传统的近端策略优化方法开发了一种新的训练方案,以适应教师策略网络与学生策略网络之间的交互。通过在四足机器人和点足双足机器人上进行广泛的室内外实验,验证了所提出架构及新训练方案的有效性,展示了在复杂地形上的稳健运动能力,并与两阶段训练方法相比表现出更优的性能。