Thanks to the powerful generative capacity of diffusion models, recent years have witnessed rapid progress in human motion generation. Existing diffusion-based methods employ disparate network architectures and training strategies. The effect of the design of each component is still unclear. In addition, the iterative denoising process consumes considerable computational overhead, which is prohibitive for real-time scenarios such as virtual characters and humanoid robots. For this reason, we first conduct a comprehensive investigation into network architectures, training strategies, and inference processs. Based on the profound analysis, we tailor each component for efficient high-quality human motion generation. Despite the promising performance, the tailored model still suffers from foot skating which is an ubiquitous issue in diffusion-based solutions. To eliminate footskate, we identify foot-ground contact and correct foot motions along the denoising process. By organically combining these well-designed components together, we present StableMoFusion, a robust and efficient framework for human motion generation. Extensive experimental results show that our StableMoFusion performs favorably against current state-of-the-art methods. Project page: https://h-y1heng.github.io/StableMoFusion-page/
翻译:得益于扩散模型强大的生成能力,近年来人体运动生成领域取得了快速进展。现有的基于扩散模型的方法采用了各不相同的网络架构和训练策略,各组件设计的具体影响尚不明确。此外,迭代去噪过程消耗大量计算开销,这对于虚拟角色和人形机器人等实时应用场景而言是难以承受的。为此,我们首先对网络架构、训练策略和推理过程进行了全面研究。基于深入分析,我们针对高效高质量人体运动生成定制了各个组件。尽管性能表现优异,定制模型仍存在足部滑动现象,这是基于扩散模型的解决方案中普遍存在的问题。为消除足部滑动,我们识别足地接触关系并在去噪过程中修正足部运动。通过将这些精心设计的组件有机结合,我们提出了StableMoFusion——一个鲁棒高效的人体运动生成框架。大量实验结果表明,我们的StableMoFusion相较于当前最先进方法具有优越性能。项目页面:https://h-y1heng.github.io/StableMoFusion-page/