Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.
翻译:深度学习方法生成的音乐常缺乏连贯性与长期组织结构,然而多尺度层级结构正是音乐信号的显著特征。为利用这一信息,我们提出了一种面向Transformer音乐生成的结构感知位置编码框架。我们设计了绝对位置、相对位置与非平稳位置信息三种变体,并在两项符号音乐生成任务(下一时间步预测与伴奏生成)中进行了全面测试。通过选取文献中的多个基线方法进行对比,并采用多项音乐驱动的评估指标,我们验证了所提方法的优势。特别地,我们的方法提升了生成作品在旋律与结构上的一致性。