We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives.
翻译:摘要:本研究探讨深度强化学习(Deep RL)能否为低成本微型类人机器人合成复杂且安全的运动技能,并使其在动态环境中组合成复杂的行为策略。我们利用Deep RL训练一个具有20个驱动关节的类人机器人,执行简化的一对一(1v1)足球比赛。最终智能体展现出稳健且动态的运动技能,例如快速跌倒恢复、行走、转向、踢球等,并以平滑、稳定且高效的方式在这些技能间切换。其运动与战术行为能够针对具体比赛情境自适应调整,这种调整能力若通过人工设计将难以实现。该智能体还发展出对比赛的基本策略理解,例如学会预测球的运动轨迹并拦截对手射门。我们的智能体在仿真环境中训练,并零样本迁移至真实机器人。研究发现,在仿真训练中综合采用高频率控制、针对性动力学随机化以及扰动训练策略,能够实现高质量的迁移效果。尽管机器人本身较为脆弱,但训练过程中对行为的适度正则化使其在保持动态敏捷性的同时学会安全有效的动作——这远超直觉对机器人能力的预期。实验表明,相比脚本化基线方法,该机器人行走速度提升181%,转向速度提升302%,起身耗时减少63%,踢球速度提高34%,并能高效组合各项技能以实现长期目标。