Generating full-body and multi-genre dance sequences from given music is a challenging task, due to the limitations of existing datasets and the inherent complexity of the fine-grained hand motion and dance genres. To address these problems, we propose FineDance, which contains 14.6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture. To the best of our knowledge, FineDance is the largest music-dance paired dataset with the most dance genres. Additionally, to address monotonous and unnatural hand movements existing in previous methods, we propose a full-body dance generation network, which utilizes the diverse generation capabilities of the diffusion model to solve monotonous problems, and use expert nets to solve unreal problems. To further enhance the genre-matching and long-term stability of generated dances, we propose a Genre&Coherent aware Retrieval Module. Besides, we propose a novel metric named Genre Matching Score to evaluate the genre-matching degree between dance and music. Quantitative and qualitative experiments demonstrate the quality of FineDance, and the state-of-the-art performance of FineNet. The FineDance Dataset and more qualitative samples can be found at our website.
翻译:从给定音乐生成包含全身动作与多体裁的舞蹈序列是一项极具挑战性的任务,其难点在于现有数据集的局限性以及细粒度手部动作与舞蹈体裁固有的复杂性。为解决上述问题,我们提出FineDance数据集,该数据集包含14.6小时的音乐-舞蹈配对数据,具有细粒度手部动作、精细体裁分类(22种舞蹈体裁)及准确姿态。据我们所知,FineDance是目前规模最大且涵盖最多舞蹈体裁的音乐-舞蹈配对数据集。此外,针对现有方法中手部动作单调且不自然的问题,我们提出全身舞蹈生成网络:利用扩散模型的多样化生成能力解决动作单调问题,并引入专家网络(expert nets)解决动作不真实问题。为增强生成舞蹈的体裁匹配度与长期稳定性,我们提出体裁与连贯性感知检索模块(Genre&Coherent aware Retrieval Module)。同时,我们设计新评价指标——体裁匹配分数(Genre Matching Score)以量化舞蹈与音乐间的体裁匹配程度。定量与定性实验证明了FineDance的数据质量及FineNet的先进性能。FineDance数据集及更多定性样本可在项目网站获取。