Loopable music generation systems enable diverse applications, but they often lack controllability and customization capabilities. We argue that enhancing controllability can enrich these models, with emotional expression being a crucial aspect for both creators and listeners. Hence, building upon LooperGP, a loopable tablature generation model, this paper explores endowing systems with control over conveyed emotions. To enable such conditional generation, we propose integrating musical knowledge by utilizing multi-granular semantic and musical features during model training and inference. Specifically, we incorporate song-level features (Emotion Labels, Tempo, and Mode) and bar-level features (Tonal Tension) together to guide emotional expression. Through algorithmic and human evaluations, we demonstrate the approach's effectiveness in producing music conveying two contrasting target emotions, happiness and sadness. An ablation study is also conducted to clarify the contributing factors behind our approach's results.
翻译:可循环音乐生成系统支持多种应用,但往往缺乏可控性和个性化定制能力。我们认为增强可控性可以丰富这些模型,其中情感表达对创作者和听众都是关键因素。因此,本文基于循环制表谱生成模型LooperGP,探索赋予系统对传达情感的控制能力。为实现这种条件生成,我们提出在模型训练和推理过程中利用多粒度语义和音乐特征来整合音乐知识。具体而言,我们结合歌曲级特征(情感标签、速度和调式)与小节级特征(调性张力)共同引导情感表达。通过算法评估和人工评估,我们证明了该方法在生成传达快乐和悲伤两种对比目标情感的音乐方面的有效性。此外还进行了消融研究,以阐明影响该方法效果的关键因素。