Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.
翻译:自DeepMimic [Peng等人,2018]提出以来,后续研究主要致力于扩展各种场景下的模拟运动技能库。本研究针对该目标提出了一种替代方案:基于单刚体角色模拟的深度强化学习方法。通过利用质心动力学模型(CDM)将全身角色表征为单刚体(SRB),并训练策略跟踪参考运动,我们获得的策略能够在无需额外学习的情况下适应各类未观测到的环境变化与控制器切换。由于状态与动作空间的维度降低,学习过程具有样本高效性。基于模拟SRB角色的状态,最终全身运动通过运动学生成的方式以物理可行方式实现。SRB模拟被形式化为二次规划(QP)问题,策略输出动作使得SRB角色能够跟随参考运动。实验证明,本策略在超便携笔记本电脑上仅需30分钟即可高效完成训练,且具备应对训练过程中未经历环境(如不平整地形奔跑、推箱子)及已学习策略间切换的能力,整个过程无需任何额外学习。