Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI

Generative AI models for music and the arts in general are increasingly complex and hard to understand. The field of eXplainable AI (XAI) seeks to make complex and opaque AI models such as neural networks more understandable to people. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE), configurations of latent space in the AI model (from 4 to 256 latent dimensions), and training datasets (Irish folk, Turkish folk, Classical, and pop) have on music generation performance when 2 or 4 meaningful musical attributes are imposed on the generative model. To date there have been no systematic comparisons of such models at this level of combinatorial detail. Our findings show that MeasureVAE has better reconstruction performance than AdversarialVAE which has better musical attribute independence. Results demonstrate that MeasureVAE was able to generate music across music genres with interpretable musical dimensions of control, and performs best with low complexity music such a pop and rock. We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using MeasureVAE to generate music across genres. Our results are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models, musical features, and datasets for more understandable generation of music.

翻译：面向音乐及一般艺术领域的生成式AI模型日益复杂且难以理解。可解释人工智能（XAI）领域致力于使神经网络等复杂且不透明的AI模型更易被人类理解。一种增强生成式AI模型可解释性的方法是向其施加少量具有语义含义的属性。本文系统性地研究了不同变分自编码器模型（MeasureVAE与AdversarialVAE）、AI模型潜空间配置（潜变量维度从4到256）及训练数据集（爱尔兰民谣、土耳其民谣、古典乐与流行乐）的组合，在向生成模型施加2或4个具有音乐意义的属性时对音乐生成性能的影响。截至目前，尚未有研究在此组合细节层面对此类模型进行系统比较。我们的研究发现，MeasureVAE在重构性能上优于AdversarialVAE，而AdversarialVAE在音乐属性独立性方面表现更佳。结果表明，MeasureVAE能够跨音乐流派生成具有可解释音乐控制维度的作品，且在流行乐与摇滚乐等低复杂度音乐上表现最优。我们建议使用MeasureVAE跨流派生成音乐时，在对4个正则化维度进行约束的情况下，最佳潜空间维度为32或64。本研究首次对各类先进音乐生成AI模型的配置进行了详尽比较，可为选择与配置更易理解的音乐生成AI模型、音乐特征及数据集提供参考。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日