This paper introduces TunesFormer, an efficient Transformer-based dual-decoder model specifically designed for the generation of melodies that adhere to user-defined musical forms. Trained on 214,122 Irish tunes, TunesFormer utilizes techniques including bar patching and control codes. Bar patching reduces sequence length and generation time, while control codes guide TunesFormer in producing melodies that conform to desired musical forms. Our evaluation demonstrates TunesFormer's superior efficiency, being 3.22 times faster than GPT-2 and 1.79 times faster than a model with linear complexity of equal scale while offering comparable performance in controllability and other metrics. TunesFormer provides a novel tool for musicians, composers, and music enthusiasts alike to explore the vast landscape of Irish music. Our model and code are available at https://github.com/sander-wood/tunesformer.
翻译:本文介绍TunesFormer,一种基于Transformer的高效双解码器模型,专门用于生成符合用户定义音乐形式的旋律。该模型在214,122首爱尔兰曲调上训练,采用了小节拼接(bar patching)和控制码(control codes)技术。小节拼接可缩短序列长度和生成时间,而控制码则引导TunesFormer生成符合期望音乐形式的旋律。我们的评估表明,TunesFormer在效率上具有显著优势,其速度是GPT-2的3.22倍,是同等规模线性复杂度模型的1.79倍,同时在可控性及其他指标上表现相当。TunesFormer为音乐家、作曲家和音乐爱好者探索爱尔兰音乐的广阔天地提供了全新工具。我们公开了模型和代码,访问地址为:https://github.com/sander-wood/tunesformer。