Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator (SING) system, a deep learning autonomous music generation system with two layers. The first is a vanilla Long Short Term Memory layer, and the second is the proposed attention layer. During generation, this attention mechanism imposes a suggested structure from a template piece on the generated music. We train SING on the MAESTRO dataset using a novel variable batching method, and compare its performance to the same model without the attention mechanism. The addition of our proposed attention mechanism significantly improves the network's ability to replicate specific structures, and it performs better on an unseen test set than a model without the attention mechanism.
翻译:尽管深度学习和生成式人工智能领域取得了诸多创新,但在音乐生成中,如何创建长期结构以及音乐作品中常见的多层次重复结构仍然是一个开放的挑战。我们提出了一种注意力层,该层采用一种新颖的方法,将用户提供的自相似性矩阵应用于先前的时间步,并在我们的相似性激励神经生成器(SING)系统中进行了演示。SING是一个具有两层的深度学习自主音乐生成系统:第一层是标准的长期短期记忆层,第二层是所提出的注意力层。在生成过程中,该注意力机制将模板片段中建议的结构施加于生成的音乐之上。我们使用一种新颖的可变批次方法在MAESTRO数据集上训练SING,并将其性能与没有该注意力机制的相同模型进行比较。添加我们提出的注意力机制显著提高了网络复制特定结构的能力,并且其在未见测试集上的表现优于没有注意力机制的模型。