Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizing conditional image-to-video generation principles. This approach pioneers the use of music as a conditioning factor in image-to-video synthesis. Our method unfolds in two stages: training an auto-encoder to predict latent optical flow between reference and driving frames, eliminating the need for joint annotation, and training a U-Net-based diffusion model to produce these latent optical flows guided by music rhythm encoded by CLAP. Although capable of producing high-quality dance videos, the baseline model struggles with rhythm alignment. We enhance the model by adding beat information, improving synchronization. We introduce a 2D motion-music alignment score (2D-MM Align) for quantitative assessment. Evaluated on the AIST++ dataset, our enhanced model shows marked improvements in 2D-MM Align score and established metrics. Video results can be found on our project page: https://DabFusion.github.io.

翻译：舞蹈生成任务从音乐到动作的转换至关重要，但现有方法主要生成关节序列，导致输出缺乏直观性，且因需精确关节标注导致数据采集复杂。我们提出《舞动节拍》扩散模型（DabFusion），该模型将音乐作为条件输入，基于图像到视频的条件生成原理，直接从静态图像生成舞蹈视频。该方法开创性地将音乐作为图像-视频合成中的条件因素。模型分两阶段实现：先训练自编码器预测参考帧与驱动帧间的潜在光流（无需关节标注），再训练基于U-Net的扩散模型，根据CLAP编码的音乐节奏生成潜在光流。尽管能生成高质量舞蹈视频，基线模型在节奏对齐方面存在不足。我们通过添加节拍信息增强模型同步性能，并提出二维运动-音乐对齐评分（2D-MM Align）进行量化评估。在AIST++数据集上的评估显示，增强模型在2D-MM对齐评分及现有指标上均有显著提升。视频结果见项目主页：https://DabFusion.github.io。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日