Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator (SING) system, a deep learning autonomous music generation system with two layers. The first is a vanilla Long Short Term Memory layer, and the second is the proposed attention layer. During generation, this attention mechanism imposes a suggested structure from a template piece on the generated music. We train SING on the MAESTRO dataset using a novel variable batching method, and compare its performance to the same model without the attention mechanism. The addition of our proposed attention mechanism significantly improves the network's ability to replicate specific structures, and it performs better on an unseen test set than a model without the attention mechanism.
翻译:尽管深度学习与生成式人工智能领域不断创新,在音乐生成中如何构建长期结构以及音乐作品中常见的多层次重复结构仍是一个开放挑战。本文提出一种注意力层,其采用新颖方法将用户提供的自相似性矩阵应用于先前时间步,并在我们提出的相似性激励神经生成器(SING)系统中进行验证——这是一个包含两层的深度学习自主音乐生成系统。第一层为标准长短期记忆网络层,第二层为本文提出的注意力层。在生成过程中,该注意力机制能够将模板乐曲中的预设结构施加于生成音乐之上。我们采用新型可变批量训练方法在MAESTRO数据集上训练SING系统,并与未配备注意力机制的相同模型进行性能对比。实验表明,引入本文提出的注意力机制显著提升了网络对特定结构的复现能力,且在未见测试集上的表现优于无注意力机制的模型。