Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody datasets limits the pre-training improvement. In this paper, we propose MelodyGLM, a multi-task pre-training framework for generating melodies with long-term structure. We design the melodic n-gram and long span sampling strategies to create local and global blank infilling tasks for modeling the local and global structures in melodies. Specifically, we incorporate pitch n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram blank infilling tasks for modeling the multi-dimensional structures in melodies. To this end, we have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet is utilized for large-scale pre-training and domain-specific n-gram lexicon construction. Both subjective and objective evaluations demonstrate that MelodyGLM surpasses the standard and previous pre-training methods. In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM achieves average improvements of 0.82, 0.87, 0.78, and 0.94 in consistency, rhythmicity, structure, and overall quality, respectively. Notably, MelodyGLM nearly matches the quality of human-composed melodies on the melody inpainting task.
翻译:预训练语言模型在各种音乐理解与生成任务中取得了显著成果。然而,现有的符号旋律生成预训练方法由于文本与音乐领域知识差异,难以捕捉音符序列中的多尺度、多维结构信息。此外,大规模符号旋律数据集的匮乏限制了预训练性能的提升。本文提出MelodyGLM——一种用于生成具有长程结构旋律的多任务预训练框架。我们设计了旋律n-gram和长跨度采样策略,创建局部与全局空白填充任务,以建模旋律中的局部与全局结构。具体而言,我们将音高n-gram、节奏n-gram及其组合n-gram融入旋律n-gram空白填充任务,以建模旋律中的多维结构。为此,我们构建了大规模符号旋律数据集MelodyNet,包含超过40万首旋律片段。MelodyNet用于大规模预训练及领域特定n-gram词典构建。主观与客观评估均表明,MelodyGLM优于标准方法和先前预训练方法。特别是在旋律续写任务中,主观评估显示MelodyGLM在连贯性、节奏性、结构性和整体质量上分别平均提升0.82、0.87、0.78和0.94。值得注意的是,在旋律补全任务中,MelodyGLM生成的旋律质量几乎与人类创作相当。