We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the coarse-level music-dance correlation and production characteristic dance primitives. In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules. In addition, we propose a Foot Refine Block to optimize the contact between the feet and the ground, enhancing the physical realism of the motion. Our approach can parallelly generate dance sequences of extremely long length, striking a balance between global choreographic patterns and local motion quality and expressiveness. Extensive experiments validate the efficacy of our method.
翻译:我们提出Lodge,一种能够根据给定音乐生成极长舞蹈序列的网络。我们将Lodge设计为两阶段由粗到细的扩散架构,并提出具有显著表现力的特征舞蹈基元作为两个扩散模型之间的中间表示。第一阶段是全局扩散,专注于理解粗粒度的音乐-舞蹈关联性并生成特征舞蹈基元。相比之下,第二阶段是局部扩散,在舞蹈基元和编舞规则的引导下并行生成细致的动作序列。此外,我们提出足部优化块(Foot Refine Block)以优化足部与地面的接触,增强运动的物理真实性。我们的方法能够并行生成极长长度的舞蹈序列,在全局编舞模式与局部动作质量及表现力之间取得平衡。大量实验验证了本方法的有效性。