Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody datasets limits the pre-training improvement. In this paper, we propose MelodyGLM, a multi-task pre-training framework for generating melodies with long-term structure. We design the melodic n-gram and long span sampling strategies to create local and global blank infilling tasks for modeling the local and global structures in melodies. Specifically, we incorporate pitch n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram blank infilling tasks for modeling the multi-dimensional structures in melodies. To this end, we have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet is utilized for large-scale pre-training and domain-specific n-gram lexicon construction. Both subjective and objective evaluations demonstrate that MelodyGLM surpasses the standard and previous pre-training methods. In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in consistency, rhythmicity, structure, and overall quality, respectively. Notably, MelodyGLM nearly matches the quality of human-composed melodies on the melody inpainting task.
翻译:预训练语言模型在多种音乐理解与生成任务中取得了显著成果。然而,现有的符号旋律生成预训练方法由于文本与音乐领域知识的差异,难以捕捉音符序列中的多尺度、多维结构信息。此外,大规模符号旋律数据集的缺乏限制了预训练性能的提升。本文提出MelodyGLM——一种用于生成具有长程结构旋律的多任务预训练框架。我们设计了旋律n-gram与长跨度采样策略,构建局部与全局空白填充任务,以建模旋律中的局部与全局结构。具体而言,我们将音高n-gram、节奏n-gram及其组合n-gram融入旋律n-gram空白填充任务,以建模旋律中的多维结构。为此,我们构建了大规模符号旋律数据集MelodyNet,包含超过40万首旋律片段。MelodyNet用于大规模预训练及领域特定n-gram词典构建。主客观评估均表明,MelodyGLM优于现有标准及先前预训练方法。其中,主观评估显示:在旋律续写任务中,MelodyGLM在连贯性、节奏感、结构性与整体质量上分别平均提升0.82、0.87、0.78与0.94。值得注意的是,在旋律修复任务中,MelodyGLM生成的旋律质量已接近人类作曲水平。