This paper introduces TunesFormer, an efficient Transformer-based dual-decoder model specifically designed for the generation of melodies that adhere to user-defined musical forms. Trained on 214,122 Irish tunes, TunesFormer utilizes techniques including bar patching and control codes. Bar patching reduces sequence length and generation time, while control codes guide TunesFormer in producing melodies that conform to desired musical forms. Our evaluation demonstrates TunesFormer's superior efficiency, being 3.22 times faster than GPT-2 and 1.79 times faster than a model with linear complexity of equal scale while offering comparable performance in controllability and other metrics. TunesFormer provides a novel tool for musicians, composers, and music enthusiasts alike to explore the vast landscape of Irish music. Our model and code are available at https://github.com/sander-wood/tunesformer.
翻译:本文提出TunesFormer——一种基于Transformer的高效双解码器模型,专为生成符合用户定义音乐形式的旋律而设计。该模型在214,122首爱尔兰曲调上训练,采用包括小节拼接(bar patching)和控制码(control codes)在内的技术。小节拼接缩短了序列长度与生成时间,而控制码引导模型生成符合预设音乐结构的旋律。实验评估表明,TunesFormer在保持可控性及其他指标可比性能的同时,生成效率较GPT-2提升3.22倍,较同等规模的线性复杂度模型提升1.79倍。该模型为音乐家、作曲家及音乐爱好者探索爱尔兰音乐的广阔领域提供了新型工具。我们的模型与代码已开源:https://github.com/sander-wood/tunesformer。