Deep Generative Models of Music Expectation

A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.

翻译：音乐情感反应的一个显著理论围绕惊讶度与期望概念展开。先前研究通过构建音乐概率模型将此理论操作化，该模型可在已知先前音乐或文化经验的"训练集"条件下，精确计算歌曲（或逐音符）的概率。然而，迄今为止这些模型仍局限于通过手工特征计算精确概率，或仅限于可能不足以表示音乐中复杂条件分布的线性模型。本研究提出采用现代深度概率生成模型——扩散模型，计算音乐输入序列的近似似然。与先前工作不同，这种由深度神经网络参数化的生成模型能够直接从训练集本身学习复杂的非线性特征。通过这种方法，我们预期此类模型能更准确模拟人类听众对音乐的"惊讶度"。文献表明，惊讶度与人类被试对给定歌曲的"喜欢"程度之间存在倒U形关系。本研究表明，预训练扩散模型产生的音乐惊讶度值确实与被测主体的"喜欢"评分呈现负二次关系，且该关系的质量可与IDyOM等最先进方法相媲美。因此，我们提出将此模型作为开发音乐期望与主观喜爱度现代深度生成模型的初步步骤。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日