Learning various motor skills for quadrupedal robots is a challenging problem that requires careful design of task-specific mathematical models or reward descriptions. In this work, we propose to learn a single capable policy using deep reinforcement learning by imitating a large number of reference motions, including walking, turning, pacing, jumping, sitting, and lying. On top of the existing motion imitation framework, we first carefully design the observation space, the action space, and the reward function to improve the scalability of the learning as well as the robustness of the final policy. In addition, we adopt a novel adaptive motion sampling (AMS) method, which maintains a balance between successful and unsuccessful behaviors. This technique allows the learning algorithm to focus on challenging motor skills and avoid catastrophic forgetting. We demonstrate that the learned policy can exhibit diverse behaviors in simulation by successfully tracking both the training dataset and out-of-distribution trajectories. We also validate the importance of the proposed learning formulation and the adaptive motion sampling scheme by conducting experiments.
翻译:四足机器人的多技能运动学习是一项具有挑战性的问题,需要针对特定任务精心设计数学模型或奖励描述。本研究提出通过深度强化学习模仿大量参考运动(包括行走、转向、踱步、跳跃、蹲坐和躺卧)来训练单一通用策略。在现有运动模仿框架基础上,我们首先系统设计观测空间、动作空间和奖励函数,以提升学习可扩展性与最终策略鲁棒性。此外,我们引入新型自适应运动采样(AMS)方法,通过维持成功与失败行为之间的平衡,使学习算法能够聚焦于高难度运动技能并避免灾难性遗忘。实验表明,学习到的策略能够成功跟踪训练数据集及分布外轨迹,在仿真中展现多样化行为。通过对比实验,我们也验证了所提学习框架与自适应运动采样方案的重要性。