High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching

Gael Le Lan,Bowen Shi,Zhaoheng Ni,Sidd Srinivasan,Anurag Kumar,Brian Ellis,David Kant,Varun Nagaraja,Ernie Chang,Wei-Ning Hsu,Yangyang Shi,Vikas Chandra

We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transformer architecture trained on a flow-matching objective the model can edit diverse high quality stereo samples of variable duration, with simple text descriptions. We adapt the ReNoise latent inversion method to flow matching and compare it with the original implementation and naive denoising diffusion implicit model (DDIM) inversion on a variety of music editing prompts. Our results indicate that our latent inversion outperforms both ReNoise and DDIM for zero-shot test-time text-guided editing on several objective metrics. Subjective evaluations exhibit a substantial improvement over previous state of the art for music editing. Code and model weights will be publicly made available. Samples are available at https://melodyflow.github.io.

翻译：我们提出了MelodyFlow，一种高效的文本可控高保真音乐生成与编辑模型。该模型基于低帧率48 kHz立体声变分自编码器编解码器生成的连续潜在表示进行操作。采用基于流匹配目标训练的扩散Transformer架构，该模型能够通过简单的文本描述编辑不同时长的高质量立体声音频样本。我们将ReNoise潜在反转方法适配于流匹配框架，并在多种音乐编辑提示下，将其与原始实现及朴素去噪扩散隐式模型（DDIM）反转进行对比。实验结果表明，在多项客观指标上，我们的潜在反转方法在零样本测试时文本引导编辑任务中均优于ReNoise和DDIM。主观评估显示，本方法在音乐编辑效果上较现有技术有显著提升。代码与模型权重将公开提供。音频样本可在https://melodyflow.github.io获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日