Loopable music generation systems enable diverse applications, but they often lack controllability and customization capabilities. We argue that enhancing controllability can enrich these models, with emotional expression being a crucial aspect for both creators and listeners. Hence, building upon LooperGP, a loopable tablature generation model, this paper explores endowing systems with control over conveyed emotions. To enable such conditional generation, we propose integrating musical knowledge by utilizing multi-granular semantic and musical features during model training and inference. Specifically, we incorporate song-level features (Emotion Labels, Tempo, and Mode) and bar-level features (Tonal Tension) together to guide emotional expression. Through algorithmic and human evaluations, we demonstrate the approach's effectiveness in producing music conveying two contrasting target emotions, happiness and sadness. An ablation study is also conducted to clarify the contributing factors behind our approach's results.
翻译:可循环音乐生成系统支持多种应用场景,但这类系统往往缺乏可控性和定制能力。我们认为增强可控性能够丰富这些模型,其中情感表达对创作者和听众而言都是关键要素。因此,本文在可循环指法谱生成模型LooperGP的基础上,探索赋予系统对情感表达的控制能力。为实现这种条件生成,我们提出通过整合多粒度语义与音乐特征来融入音乐知识,具体在模型训练与推理阶段引入歌曲级特征(情感标签、速度与调式)和小节级特征(调性张力)共同指导情感表达。通过算法评估与人工评测,我们证明了该方法在生成"快乐"与"悲伤"两种对比目标情感音乐时的有效性。此外,消融实验进一步厘清了影响实验结果的关键因素。