Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when applied to downstream tasks. In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, SMPs can be kept frozen and reused as general-purpose reward functions to train new policies to produce naturalistic behaviors for downstream tasks. We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore, SMP can compose different styles to synthesize new styles not present in the original dataset. Our method can create reusable and modular motion priors that produce high-quality motions comparable to state-of-the-art adversarial imitation learning methods. In our experiments, we demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters. Video available at https://youtu.be/jBA2tWk6vzU
翻译:数据驱动的运动先验能引导代理生成自然行为,在塑造逼真虚拟角色中扮演关键角色。对抗模仿学习是从参考运动数据中学习运动先验的高效方法。然而,除少数例外情况外,对抗性先验需为每个新控制器重新训练,这限制了其可复用性,且在下游任务中需保留参考运动数据。本研究提出分数匹配运动先验(SMP),利用预训练运动扩散模型与分数蒸馏采样(SDS)构建可复用的任务无关运动先验。SMP可在运动数据集上预训练,独立于任何控制策略或任务。训练完成后,SMP可保持冻结状态并作为通用奖励函数,用于训练新策略以在下游任务中生成自然行为。研究表明,在大规模数据集上训练的通用运动先验可转化为多种风格特定先验。此外,SMP能组合不同风格,合成原始数据集中不存在的新风格。本方法可创建可复用且模块化的运动先验,其生成的高质量运动媲美最先进的对抗模仿学习方法。实验中,我们在包含物理仿真人体角色的多样化控制任务套件上验证了SMP的有效性。视频见https://youtu.be/jBA2tWk6vzU