Dance2MIDI: Dance-driven multi-instruments music generation

Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instruments scenario is under-explored. The challenges associated with the dance-driven multi-instrument music (MIDI) generation are twofold: 1) no publicly available multi-instruments MIDI and video paired dataset and 2) the weak correlation between music and video. To tackle these challenges, we build the first multi-instruments MIDI and dance paired dataset (D2MIDI). Based on our proposed dataset, we introduce a multi-instruments MIDI generation framework (Dance2MIDI) conditioned on dance video. Specifically, 1) to capture the relationship between dance and music, we employ the Graph Convolutional Network to encode the dance motion. This allows us to extract features related to dance movement and dance style, 2) to generate a harmonious rhythm, we utilize a Transformer model to decode the drum track sequence, leveraging a cross-attention mechanism, and 3) we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task. A BERT-like model is employed to comprehend the context of the entire music piece through self-supervised learning. We evaluate the generated music of our framework trained on the D2MIDI dataset and demonstrate that our method achieves State-of-the-Art performance.

翻译：舞蹈驱动的音乐生成旨在根据舞蹈视频生成音乐片段。以往研究主要关注单声道或原始音频生成，而多乐器场景下的探索尚不充分。舞蹈驱动的多乐器MIDI生成面临两大挑战：1）缺乏公开可用的多乐器MIDI与视频配对数据集；2）音乐与视频之间的弱相关性。为应对这些挑战，我们构建了首个多乐器MIDI与舞蹈配对数据集（D2MIDI）。基于所提数据集，我们提出了一种以舞蹈视频为条件的多乐器MIDI生成框架（Dance2MIDI）。具体而言：1）为捕捉舞蹈与音乐的关系，采用图卷积网络对舞蹈动作进行编码，从而提取与舞蹈动作及风格相关的特征；2）为生成和谐节奏，利用Transformer模型通过交叉注意力机制解码鼓点音轨序列；3）将基于鼓点音轨生成其余音轨的任务建模为序列理解与补全任务，采用类BERT模型通过自监督学习理解整首音乐片段的上下文。通过在D2MIDI数据集上训练该框架，我们对生成的音乐进行评估，结果表明该方法达到了最优性能。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日