Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.
翻译:自DeepMimic [Peng et al. 2018] 提出以来,后续研究主要集中于扩展各种场景下的模拟动作库。本研究提出了一种替代方案:基于单刚体角色模拟的深度强化学习方法。通过采用质心动力学模型将全身角色简化为单刚体,并训练策略追踪参考运动,我们获得的策略无需额外学习即可适应多种未观测到的环境变化与控制器切换。由于状态与动作空间的维度降低,该学习过程具有样本高效性。最终全身运动基于模拟单刚体角色的状态,以符合物理规律的方式运动学生成。单刚体模拟被建模为二次规划问题,策略输出的动作可使单刚体角色跟随参考运动。实验证明,该策略在超便携笔记本电脑上仅需30分钟高效训练,即可无需额外学习地应对学习过程中未经历的环境(如不平坦地形奔跑或推箱体)及已学习策略间的切换。