This paper introduces the HumTrans dataset, which is publicly available and primarily designed for humming melody transcription. The dataset can also serve as a foundation for downstream tasks such as humming melody based music generation. It consists of 500 musical compositions of different genres and languages, with each composition divided into multiple segments. In total, the dataset comprises 1000 music segments. To collect this humming dataset, we employed 10 college students, all of whom are either music majors or proficient in playing at least one musical instrument. Each of them hummed every segment twice using the web recording interface provided by our designed website. The humming recordings were sampled at a frequency of 44,100 Hz. During the humming session, the main interface provides a musical score for students to reference, with the melody audio playing simultaneously to aid in capturing both melody and rhythm. The dataset encompasses approximately 56.22 hours of audio, making it the largest known humming dataset to date. The dataset will be released on Hugging Face, and we will provide a GitHub repository containing baseline results and evaluation codes.
翻译:本文介绍HumTrans数据集,这是一个公开可用的数据集,主要服务于哼唱旋律转录任务。该数据集还可作为基础资源支持下游任务,例如基于哼唱旋律的音乐生成。数据集包含500首不同流派和语言创作的音乐作品,每首作品被划分为多个片段,共计1000个音乐片段。为采集哼唱数据,我们招募了10名大学生,均为音乐专业或精通至少一种乐器。每位学生通过我们设计网站的网页录制接口,对每个片段哼唱两次。哼唱录音采样频率为44,100赫兹。录音过程中,主界面提供乐谱供学生参照,并同步播放旋律音频以辅助捕捉旋律与节奏。数据集总时长约56.22小时,是迄今已知规模最大的哼唱数据集。该数据集将于Hugging Face平台发布,同时我们将提供包含基线结果和评估代码的GitHub仓库。