Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization, while improving the quality of generated harmonies and capturing the abundant variability of chord progressions. Specifically, LHVAE incorporates latent variables and emotional conditions at different levels (piece- and bar-level) to model the global and local music properties. Additionally, we introduce an attention-based melody context vector at each step to better learn the correspondence between melodies and harmonies. Experimental results of the objective evaluation show that our proposed model outperforms other LSTM-based models. Through subjective evaluation, we conclude that only altering the chords hardly changes the overall emotion of the music. The qualitative analysis demonstrates the ability of our model to generate variable harmonies.
翻译:现有的旋律和声化模型在提升生成和声质量方面取得了显著进展,但大多忽略了音乐蕴含的情感。同时,先前方法生成的和声多样性不足。为解决这些问题,我们提出一种新颖的基于LSTM的层次化变分自编码器(LHVAE),在改善生成和声质量并捕捉和弦进行丰富多样性的同时,研究情感条件对旋律和声化的影响。具体而言,LHVAE在不同层级(乐曲级与小节级)引入潜在变量与情感条件,对全局与局部音乐属性进行建模。此外,我们在每个时间步引入基于注意力的旋律上下文向量,以更有效地学习旋律与和声的对应关系。客观评估的实验结果表明,我们提出的模型优于其他基于LSTM的模型。通过主观评估,我们得出结论:仅改变和弦难以显著改变音乐的整体情感。定性分析证明了模型生成多样化和声的能力。