Dance is a form of human motion characterized by emotional expression and communication, playing a role in various fields such as music, virtual reality, and content creation. Existing methods for dance generation often fail to adequately capture the inherently sequential, rhythmical, and music-synchronized characteristics of dance. In this paper, we propose \emph{MambaDance}, a new dance generation approach that leverages a Mamba-based diffusion model. Mamba, well-suited to handling long and autoregressive sequences, is integrated into our two-stage diffusion architecture, substituting off-the-shelf Transformer. Additionally, considering the critical role of musical beats in dance choreography, we propose a Gaussian-based beat representation to explicitly guide the decoding of dance sequences. Experiments on AIST++ and FineDance datasets for each sequence length show that our proposed method effectively generates plausible dance movements while reflecting essential characteristics, consistently from short to long dances, compared to the previous methods. Additional qualitative results and demo videos are available at \small{https://vision3d-lab.github.io/mambadance}.
翻译:舞蹈是一种以情感表达与交流为特征的人体运动形式,在音乐、虚拟现实及内容创作等多个领域发挥着重要作用。现有舞蹈生成方法往往难以充分捕捉舞蹈固有的时序性、节律性及音乐同步性特征。本文提出一种基于Mamba扩散模型的新型舞蹈生成方法——\emph{MambaDance}。Mamba架构因其擅长处理长序列自回归数据的特性,被整合至我们的两阶段扩散模型框架中,替代了现成的Transformer模块。此外,考虑到音乐节拍在舞蹈编排中的关键作用,我们提出一种基于高斯分布的节拍表征方法,以显式指导舞蹈序列的解码过程。在AIST++和FineDance数据集上针对不同序列长度的实验表明:相较于现有方法,我们提出的方法能有效生成符合舞蹈特性的合理动作,且从短序列到长序列均保持生成一致性。更多定性结果与演示视频可见于\small{https://vision3d-lab.github.io/mambadance}。