We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions. The strength of M2D2M lies in its dynamic transition probability within the discrete diffusion model, which adapts transition probabilities based on the proximity between motion tokens, encouraging mixing between different modes. Complemented by a two-phase sampling strategy that includes independent and joint denoising steps, M2D2M effectively generates long-term, smooth, and contextually coherent human motion sequences, utilizing a model trained for single-motion generation. Extensive experiments demonstrate that M2D2M surpasses current state-of-the-art benchmarks for motion generation from text descriptions, showcasing its efficacy in interpreting language semantics and generating dynamic, realistic motions.
翻译:本文提出多动作离散扩散模型(M2D2M),这是一种利用离散扩散模型优势、从多动作文本描述生成人体运动的新方法。该方法巧妙地解决了生成多动作序列的挑战,确保了动作间的无缝过渡以及一系列动作的整体连贯性。M2D2M的核心优势在于其离散扩散模型内部的动态转移概率机制,该机制根据运动标记之间的邻近度自适应调整转移概率,从而促进不同模态之间的混合。辅以一个包含独立去噪和联合去噪步骤的两阶段采样策略,M2D2M能够有效利用为单动作生成训练的模型,生成长时、平滑且上下文连贯的人体运动序列。大量实验表明,M2D2M在基于文本描述的运动生成任务上超越了当前最先进的基准模型,证明了其在解释语言语义以及生成动态、逼真运动方面的卓越效能。