Generative AI models for music and the arts in general are increasingly complex and hard to understand. The field of eXplainable AI (XAI) seeks to make complex and opaque AI models such as neural networks more understandable to people. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE), configurations of latent space in the AI model (from 4 to 256 latent dimensions), and training datasets (Irish folk, Turkish folk, Classical, and pop) have on music generation performance when 2 or 4 meaningful musical attributes are imposed on the generative model. To date there have been no systematic comparisons of such models at this level of combinatorial detail. Our findings show that MeasureVAE has better reconstruction performance than AdversarialVAE which has better musical attribute independence. Results demonstrate that MeasureVAE was able to generate music across music genres with interpretable musical dimensions of control, and performs best with low complexity music such a pop and rock. We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using MeasureVAE to generate music across genres. Our results are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models, musical features, and datasets for more understandable generation of music.
翻译:面向音乐及一般艺术领域的生成式AI模型日益复杂且难以理解。可解释人工智能(XAI)领域致力于使神经网络等复杂且不透明的AI模型更易被人类理解。一种增强生成式AI模型可解释性的方法是向其施加少量具有语义含义的属性。本文系统性地研究了不同变分自编码器模型(MeasureVAE与AdversarialVAE)、AI模型潜空间配置(潜变量维度从4到256)及训练数据集(爱尔兰民谣、土耳其民谣、古典乐与流行乐)的组合,在向生成模型施加2或4个具有音乐意义的属性时对音乐生成性能的影响。截至目前,尚未有研究在此组合细节层面对此类模型进行系统比较。我们的研究发现,MeasureVAE在重构性能上优于AdversarialVAE,而AdversarialVAE在音乐属性独立性方面表现更佳。结果表明,MeasureVAE能够跨音乐流派生成具有可解释音乐控制维度的作品,且在流行乐与摇滚乐等低复杂度音乐上表现最优。我们建议使用MeasureVAE跨流派生成音乐时,在对4个正则化维度进行约束的情况下,最佳潜空间维度为32或64。本研究首次对各类先进音乐生成AI模型的配置进行了详尽比较,可为选择与配置更易理解的音乐生成AI模型、音乐特征及数据集提供参考。