DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present DCM, a new multi-modal 3D dataset, which for the first time combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community, covering 4 music genres. With this dataset, we uncover that dance camera movement is multifaceted and human-centric, and possesses multiple influencing factors, making dance camera synthesis a more challenging task compared to camera or dance synthesis alone. To overcome these difficulties, we propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy. For evaluation, we devise new metrics measuring camera movement quality, diversity, and dancer fidelity. Utilizing these metrics, we conduct extensive experiments on our DCM dataset, providing both quantitative and qualitative evidence showcasing the effectiveness of our DanceCamera3D model. Code and video demos are available at https://github.com/Carmenw1203/DanceCamera3D-Official.

翻译：[translated abstract in Chinese] 编舞师决定舞蹈的视觉形态，而摄影师决定舞蹈的最终呈现形式。近年来，多种方法与数据集已证明舞蹈合成的可行性。然而，由于配对数据的稀缺性，结合音乐与舞蹈的摄影机运动合成仍是一个未解决的挑战性问题。为此，我们提出DCM——首个融合摄影机运动、舞蹈动作与音乐音频的多模态三维数据集。该数据集包含来自动漫社区的108段舞蹈序列（3.2小时）及其配套的舞蹈-摄影机-音乐配对数据，覆盖四种音乐流派。基于该数据集，我们发现舞蹈摄影机运动具有多面性与人体中心性特征，且受多重因素影响，这使得舞蹈摄影机合成任务比单独的摄影机或舞蹈合成更具挑战性。为应对这些困难，我们提出DanceCamera3D——一种基于Transformer的扩散模型，创新性地整合了人体注意力损失函数与条件分离策略。在评估方面，我们设计了测量摄影机运动质量、多样性及舞者保真度的新型指标。利用这些指标，我们在DCM数据集上开展大量实验，通过定量与定性证据证明DanceCamera3D模型的有效性。代码与视频演示见https://github.com/Carmenw1203/DanceCamera3D-Official。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日