Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.
翻译:自DeepMimic [Peng等人 2018] 提出以来,后续研究主要聚焦于扩展各种场景下模拟动作的库。在本研究中,我们针对这一目标提出了一种替代方法——基于单刚体角色模拟的深度强化学习方法。通过使用质心动力学模型(CDM)将全身角色表示为单个刚体(SRB),并训练策略来跟踪参考运动,我们能够获得一种无需额外学习即可适应各种未观测环境变化和控制器切换的策略。由于状态和动作空间维度降低,学习过程具有样本高效性。最终全身运动基于模拟SRB角色的状态,以物理合理的方式通过运动学生成。SRB模拟被建模为二次规划(QP)问题,策略输出使SRB角色能够跟随参考运动的动作。我们证明,该策略在超便携笔记本电脑上仅需30分钟即可高效完成训练,无需任何额外学习便能应对学习过程中未经历的环境(如在崎岖地形奔跑或推箱子),并实现所学策略之间的切换。