MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron,Sertan Girgin,Mauro Verzetti,Damien Vincent,Matej Kastelic,Zalán Borsos,Brian McWilliams,Victor Ungureanu,Olivier Bachem,Olivier Pietquin,Matthieu Geist,Léonard Hussenot,Neil Zeghidour,Andrea Agostinelli

We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.

翻译：我们提出MusicRL，这是首个基于人类反馈微调的音乐生成系统。文本到音乐模型的鉴赏尤其具有主观性，因为音乐性的概念以及特定文本描述背后的意图均依赖于用户（例如，"快节奏健身音乐"这样的描述可能对应复古吉他独奏或电子流行节拍）。这不仅使得此类模型的监督训练具有挑战性，还要求在其部署后的微调过程中整合持续的人类反馈。MusicRL是一个预训练的自回归MusicLM（Agostinelli等人，2023）模型，该模型对离散音频令牌进行强化学习微调以最大化序列级奖励。我们借助选定的标注员设计了与文本一致性和音频质量相关的奖励函数，并利用这些函数将MusicLM微调为MusicRL-R。我们将MusicLM部署给用户，并收集了包含30万个成对偏好的大规模数据集。通过基于人类反馈的强化学习（RLHF），我们训练了首个大规模整合人类反馈的文本到音乐模型MusicRL-U。人类评估表明，MusicRL-R和MusicRL-U均优于基线模型。最终，MusicRL-RU结合了两种方法，根据人类评估员的判断获得了最佳模型。消融研究揭示了影响人类偏好的音乐属性，表明文本一致性和音频质量仅解释了其中一部分因素。这凸显了音乐欣赏中主观性的普遍性，并呼吁在音乐生成模型的微调中进一步纳入人类听众的参与。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日