We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a latent space can be a non-trivial effort. On the other hand, accelerating generation by naively increasing the sampling step size, e.g., DDIM, often leads to quality degradation as it fails to approximate the complex denoising distribution. To address these issues, we propose EMDM, which captures the complex distribution during multiple sampling steps in the diffusion model, allowing for much fewer sampling steps and significant acceleration in generation. This is achieved by a conditional denoising diffusion GAN to capture multimodal data distributions among arbitrary (and potentially larger) step sizes conditioned on control signals, enabling fewer-step motion sampling with high fidelity and diversity. To minimize undesired motion artifacts, geometric losses are imposed during network learning. As a result, EMDM achieves real-time motion generation and significantly improves the efficiency of motion diffusion models compared to existing methods while achieving high-quality motion generation. Our code will be publicly available upon publication.
翻译:我们提出高效运动扩散模型(Efficient Motion Diffusion Model, EMDM),用于实现快速且高质量的人体运动生成。当前最先进的生成扩散模型虽已取得显著成果,但在兼顾生成速度与质量方面仍存在挑战。一方面,前人工作(如运动潜在扩散)通过潜在空间扩散提升效率,但构建该潜在空间本身并非易事;另一方面,直接增大采样步长(如DDIM)以加速生成往往导致质量下降,因其难以逼近复杂的去噪分布。为解决上述问题,我们提出EMDM,通过在扩散模型的多步采样中捕获复杂分布,实现更少采样步数与生成速度的显著提升。该模型采用条件去噪扩散生成对抗网络,在控制信号条件下捕获任意(可能更大)步长间的多模态数据分布,从而以高保真度与多样性实现少步运动采样。为减少运动伪影,网络训练中引入几何损失约束。实验结果表明,与现有方法相比,EMDM实现了实时运动生成,在保持高质量生成的同时显著提升了运动扩散模型的效率。相关代码将在论文发表后开源。