Music-driven 3D dance generation has attracted increasing attention in recent years, with promising applications in choreography, virtual reality, and creative content creation. Previous research has generated promising realistic dance movement from audio signals. However, traditional methods underutilize genre conditioning, often treating it as auxiliary modifiers rather than core semantic drivers. This oversight compromises music-motion synchronization and disrupts dance genre continuity, particularly during complex rhythmic transitions, thereby leading to visually unsatisfactory effects. To address the challenge, we propose MEGADance, a novel architecture for music-driven 3D dance generation. By decoupling choreographic consistency into dance generality and genre specificity, MEGADance demonstrates significant dance quality and strong genre controllability. It consists of two stages: (1) High-Fidelity Dance Quantization Stage (HFDQ), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) and reconstructs them with kinematic-dynamic constraints, and (2) Genre-Aware Dance Generation Stage (GADG), which maps music into the latent representation by synergistic utilization of Mixture-of-Experts (MoE) mechanism with Mamba-Transformer hybrid backbone. Extensive experiments on the FineDance and AIST++ dataset demonstrate the state-of-the-art performance of MEGADance both qualitatively and quantitatively. Code is available at https://github.com/XulongT/MEGADance.
翻译:近年来,音乐驱动的三维舞蹈生成吸引了越来越多的关注,在编舞、虚拟现实和创意内容创作等领域展现出广阔的应用前景。先前的研究已能从音频信号中生成具有真实感的舞蹈动作。然而,传统方法未能充分利用流派条件信息,往往将其视为辅助修饰而非核心语义驱动。这种疏忽损害了音乐与动作的同步性,并破坏了舞蹈流派的连贯性,尤其是在复杂的节奏转换过程中,从而导致视觉效果不尽如人意。为应对这一挑战,我们提出了MEGADance,一种用于音乐驱动三维舞蹈生成的新型架构。通过将编舞一致性解耦为舞蹈通用性与流派特异性,MEGADance展现出卓越的舞蹈生成质量和强大的流派可控性。它包含两个阶段:(1) 高保真舞蹈量化阶段,通过有限标量量化将舞蹈动作编码为潜在表示,并在运动学-动力学约束下进行重建;(2) 流派感知舞蹈生成阶段,通过协同利用专家混合机制与Mamba-Transformer混合主干网络,将音乐映射到该潜在表示。在FineDance和AIST++数据集上进行的大量实验表明,MEGADance在定性和定量评估上均达到了最先进的性能。代码发布于 https://github.com/XulongT/MEGADance。