Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.
翻译:深度学习生成的音乐通常缺乏连贯性和长期组织结构,而多尺度层级结构正是音乐信号的显著特征。为充分利用这一信息,我们提出了一种基于结构化感知位置编码的Transformer音乐生成框架。我们设计了三种变体,分别对应绝对位置信息、相对位置信息和非平稳位置信息。我们在两项符号音乐生成任务(下一时间步预测与伴奏生成)上进行了全面测试。通过选取文献中的多个基线模型进行比较,并采用多种音乐驱动的评估指标验证了本方法的优势。特别地,我们的方法显著提升了生成乐曲的旋律连贯性与结构一致性。