Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representation to enhance choreographic consistency. MatchDance employs a two-stage design: (1) a Kinematic-Dynamic-based Quantization Stage (KDQS), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) with kinematic-dynamic constraints and reconstructs them with high fidelity, and (2) a Hybrid Music-to-Dance Generation Stage(HMDGS), which uses a Mamba-Transformer hybrid architecture to map music into the latent representation, followed by the KDQS decoder to generate 3D dance motions. Additionally, a music-dance retrieval framework and comprehensive metrics are introduced for evaluation. Extensive experiments on the FineDance dataset demonstrate state-of-the-art performance.
翻译:音乐到舞蹈生成是一个融合编舞、虚拟现实与创意内容生成的交叉领域,具有挑战性且任务关键。尽管其重要性显著,现有方法在实现编舞一致性方面存在严重局限性。为解决该问题,我们提出MatchDance——一种面向音乐到舞蹈生成的新型框架,通过构建潜在表征来增强编舞一致性。MatchDance采用两阶段设计:(1) 运动学-动力学量化阶段(KDQS),利用有限标量量化(FSQ)结合运动学与动力学约束对舞蹈动作进行潜在表征编码,并以高保真度重建动作;(2) 混合音乐到舞蹈生成阶段(HMDGS),采用Mamba-Transformer混合架构将音乐映射至潜在表征,随后通过KDQS解码器生成三维舞蹈动作。此外,我们引入音乐-舞蹈检索框架及综合性评估指标。在FineDance数据集上的大量实验表明,该方法达到了最优性能。