This paper introduces the HumTrans dataset, which is publicly available and primarily designed for humming melody transcription. The dataset can also serve as a foundation for downstream tasks such as humming melody based music generation. It consists of 500 musical compositions of different genres and languages, with each composition divided into multiple segments. In total, the dataset comprises 1000 music segments. To collect this humming dataset, we employed 10 college students, all of whom are either music majors or proficient in playing at least one musical instrument. Each of them hummed every segment twice using the web recording interface provided by our designed website. The humming recordings were sampled at a frequency of 44,100 Hz. During the humming session, the main interface provides a musical score for students to reference, with the melody audio playing simultaneously to aid in capturing both melody and rhythm. The dataset encompasses approximately 56.22 hours of audio, making it the largest known humming dataset to date. The dataset will be released on Hugging Face, and we will provide a GitHub repository containing baseline results and evaluation codes.
翻译:本文介绍HumTrans数据集,该数据集公开可用,主要设计用于哼唱旋律转录。该数据集亦可作为哼唱旋律音乐生成等下游任务的基础。它包含500首不同体裁和语言类型的音乐作品,每首作品分成多个片段,总计1000个音乐片段。为收集此哼唱数据集,我们招募了10名大学生,均为音乐专业或至少精通一种乐器。每位参与者通过我们设计网站的录音界面,对每个片段哼唱两次。哼唱录音以44,100 Hz频率采样。哼唱过程中,主界面提供乐谱供学生参考,同时播放旋律音频以辅助捕捉旋律与节奏。该数据集包含约56.22小时音频,是迄今为止已知最大的哼唱数据集。数据集将在Hugging Face平台发布,并提供包含基线结果与评估代码的GitHub仓库。