The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency.
翻译:音乐驱动舞蹈生成任务旨在生成与给定音乐相匹配的连贯舞蹈动作。现有方法虽能生成物理上合理的舞蹈,但通常难以泛化到未见的音乐数据。这一挑战源于三个方面:1)舞蹈动作的高度多样性及音乐模态分布的显著差异,导致难以生成与音乐对齐的舞蹈动作;2)缺乏大规模音乐-舞蹈数据集,阻碍了从音乐中生成具有泛化能力的舞蹈动作;3)舞蹈动作的持续性特征对保持一致的舞蹈风格构成挑战。本文提出EnchantDance框架,一种先进的舞蹈生成方法。针对原始舞蹈序列在时间轴上的冗余性,EnchantDance首先构建强表征的舞蹈潜空间,并在该空间上训练舞蹈扩散模型。为弥补数据缺口,我们构建了大规模音乐-舞蹈数据集ChoreoSpectrum3D Dataset,包含四种舞蹈类型,总时长70.32小时,是迄今报道规模最大的音乐-舞蹈数据集。为增强音乐体裁与舞蹈风格的一致性,我们利用迁移学习预训练音乐体裁预测网络,并将音乐体裁作为额外条件信息融入舞蹈扩散模型的训练。大量实验表明,所提框架在舞蹈质量、多样性和一致性方面均达到最优性能。