Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.
翻译:运动模仿是基于物理的角色动画中的基础任务。然而,现有的大多数运动模仿方法基于强化学习,存在奖励工程复杂、方差高、硬探索下收敛缓慢等问题。具体而言,它们通常需要数十小时甚至数天的训练才能模仿简单的运动序列,导致可扩展性较差。本文利用可微物理模拟器,提出了一种名为DiffMimic的高效运动模仿方法。我们的关键洞察在于:可微物理模拟器将复杂的策略学习任务简化为更简单的状态匹配问题。具体而言,可微物理模拟器通过基于真实物理先验的解析梯度学习稳定策略,因此相比强化学习方法具有显著更快的收敛速度和更高的稳定性。此外,为了逃离局部最优,我们引入了演示回放机制,使其在长时域中实现稳定的梯度反向传播。在标准基准上的大量实验表明,DiffMimic在样本效率和时间效率上均优于现有方法(如DeepMimic)。特别值得注意的是,DiffMimic使物理模拟角色仅需10分钟训练即可学习后空翻,并在3小时训练后实现该动作的循环执行,而现有方法可能需要约一天训练才能循环后空翻。更重要的是,我们期望DiffMimic能通过可微布料模拟等技术支持更多可微动画系统的发展。