Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization, while improving the quality of generated harmonies and capturing the abundant variability of chord progressions. Specifically, LHVAE incorporates latent variables and emotional conditions at different levels (piece- and bar-level) to model the global and local music properties. Additionally, we introduce an attention-based melody context vector at each step to better learn the correspondence between melodies and harmonies. Experimental results of the objective evaluation show that our proposed model outperforms other LSTM-based models. Through subjective evaluation, we conclude that only altering the chords hardly changes the overall emotion of the music. The qualitative analysis demonstrates the ability of our model to generate variable harmonies.
翻译:现有的旋律和声编配模型在提升生成和声质量方面取得了显著进展,但多数模型忽略了音乐蕴含的情感要素。同时,以往方法生成和声的多样性仍显不足。为解决上述问题,我们提出一种新型的基于LSTM的层次变分自编码器(LHVAE),在探究情感条件对旋律和声编配影响的同时,提升生成和声质量并捕捉和弦进行中丰富的多样性。具体而言,LHVAE在不同层级(乐段级与乐句级)引入潜变量与情感条件,以建模全局与局部音乐属性。此外,我们引入基于注意力的旋律上下文向量,用于更好地学习旋律与和声的对应关系。客观评估实验结果表明,所提模型优于其他基于LSTM的模型。通过主观评估我们得出,仅改变和弦难以显著改变音乐的整体情感。定性分析验证了本模型生成多样化和声的能力。