We present HumanCM, a one-step human motion prediction framework built upon consistency models. Instead of relying on multi-step denoising as in diffusion-based methods, HumanCM performs efficient single-step generation by learning a self-consistent mapping between noisy and clean motion states. The framework adopts a Transformer-based spatiotemporal architecture with temporal embeddings to model long-range dependencies and preserve motion coherence. Experiments on Human3.6M and HumanEva-I demonstrate that HumanCM achieves comparable or superior accuracy to state-of-the-art diffusion models while reducing inference steps by up to two orders of magnitude.
翻译:本文提出HumanCM,一种基于一致性模型的单步人体运动预测框架。与基于扩散模型的多步去噪方法不同,HumanCM通过学习含噪运动状态与洁净运动状态间的自洽映射,实现高效的单步生成。该框架采用基于Transformer的时空架构,结合时间嵌入来建模长程依赖关系并保持运动连贯性。在Human3.6M和HumanEva-I数据集上的实验表明,HumanCM在保持与最先进扩散模型相当或更优精度的同时,将推理步骤减少了高达两个数量级。