Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; other research uses large language models to predict edited music, resulting in imprecise audio reconstruction. To Combine the strengths and address these limitations, we introduce Instruct-MusicGen, a novel approach that finetunes a pretrained MusicGen model to efficiently follow editing instructions such as adding, removing, or separating stems. Our approach involves a modification of the original MusicGen architecture by incorporating a text fusion module and an audio fusion module, which allow the model to process instruction texts and audio inputs concurrently and yield the desired edited music. Remarkably, Instruct-MusicGen only introduces 8% new parameters to the original MusicGen model and only trains for 5K steps, yet it achieves superior performance across all tasks compared to existing baselines, and demonstrates performance comparable to the models trained for specific tasks. This advancement not only enhances the efficiency of text-to-music editing but also broadens the applicability of music language models in dynamic music production environments.

翻译：文本到音乐编辑的最新进展——即采用文本查询来修改音乐（例如，通过改变其风格或调整乐器组成部分）——为AI辅助音乐创作带来了独特的挑战与机遇。该领域先前的方法受限于必须从头开始训练特定的编辑模型，这既耗费资源又效率低下；其他研究则使用大型语言模型来预测编辑后的音乐，导致音频重建不精确。为了结合优势并解决这些局限性，我们引入了Instruct-MusicGen，这是一种新颖的方法，通过对预训练的MusicGen模型进行微调，使其能够高效地遵循添加、移除或分离音轨等编辑指令。我们的方法包括对原始MusicGen架构的修改，通过引入一个文本融合模块和一个音频融合模块，使模型能够同时处理指令文本和音频输入，并生成所需的编辑后音乐。值得注意的是，Instruct-MusicGen仅向原始MusicGen模型引入了8%的新参数，并且仅训练了5千步，然而它在所有任务上都实现了优于现有基线的性能，并展现出与针对特定任务训练的模型相当的性能。这一进展不仅提升了文本到音乐编辑的效率，还拓宽了音乐语言模型在动态音乐制作环境中的适用性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日