This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing methods for spatial control in text-conditioned motion generation suffer from significant runtime inefficiency. To address this issue, we first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model (MLD). By employing one-step (or few-step) inference, we further improve the runtime efficiency of the motion latent diffusion model for motion generation. To ensure effective controllability, we incorporate a motion ControlNet within the latent space of MotionLCM and enable explicit control signals (e.g., pelvis trajectory) in the vanilla motion space to control the generation process directly, similar to controlling other latent-free diffusion models for motion generation. By employing these techniques, our approach can generate human motions with text and control signals in real-time. Experimental results demonstrate the remarkable generation and controlling capabilities of MotionLCM while maintaining real-time runtime efficiency.
翻译:摘要:本文提出MotionLCM,将可控运动生成提升至实时水平。现有文本条件运动生成中的空间控制方法存在显著的运行时效率低下问题。为解决该问题,我们首先基于潜在扩散模型(MLD)提出运动潜在一致性模型(MotionLCM)用于运动生成。通过采用单步(或少量步数)推理,我们进一步提高了运动潜在扩散模型在运动生成中的运行时效率。为确保有效的可控性,我们在MotionLCM的潜在空间中融入运动ControlNet,并允许在原始运动空间中显式控制信号(例如骨盆轨迹)直接调控生成过程,类似于控制其他无潜在扩散模型进行运动生成。通过采用这些技术,我们的方法能够实时生成带有文本和控制信号的人体运动。实验结果表明,MotionLCM在保持实时运行效率的同时,具有卓越的生成与控制能力。