Zero-Shot Duet Singing Voices Separation with Diffusion Models

In recent studies, diffusion models have shown promise as priors for solving audio inverse problems. These models allow us to sample from the posterior distribution of a target signal given an observed signal by manipulating the diffusion process. However, when separating audio sources of the same type, such as duet singing voices, the prior learned by the diffusion process may not be sufficient to maintain the consistency of the source identity in the separated audio. For example, the singer may change from one to another occasionally. Tackling this problem will be useful for separating sources in a choir, or a mixture of multiple instruments with similar timbre, without acquiring large amounts of paired data. In this paper, we examine this problem in the context of duet singing voices separation, and propose a method to enforce the coherency of singer identity by splitting the mixture into overlapping segments and performing posterior sampling in an auto-regressive manner, conditioning on the previous segment. We evaluate the proposed method on the MedleyVox dataset and show that the proposed method outperforms the naive posterior sampling baseline. Our source code and the pre-trained model are publicly available at https://github.com/yoyololicon/duet-svs-diffusion.

翻译：近期研究表明，扩散模型作为解决音频逆问题的先验方法展现出潜力。这类模型通过调控扩散过程，能够从给定观测信号的条件下对目标信号的后验分布进行采样。然而，在分离同类型音频源（如二重唱歌声）时，扩散过程学习的先验知识可能不足以维持分离音频中源身份的一致性。具体表现为：歌唱者身份可能发生非预期的切换。解决该问题将有助于在不获取大量配对数据的情况下，分离合唱团声部或音色相近的多乐器混合信号。本文以二重唱歌声分离为研究场景，提出一种通过将混合信号分割为重叠片段、并以自回归方式基于前一帧条件进行后验采样的方法，以强化歌唱者身份的一致性。我们在MedleyVox数据集上对方法进行评估，结果表明所提方法优于朴素的后验采样基线。源代码与预训练模型已开源至 https://github.com/yoyololicon/duet-svs-diffusion。

相关内容

Duet

关注 0

Duet Game 开发商Kumobius Pty Ltd，更新时间2014年5月2日。
Duet Game的节奏并不复杂，通过不断的重新排列组合，来重新定义关卡的难度。

游戏界面不定时飘来方块，根据音乐的节奏来变换着队形。而玩家需要做的便是，在适当的时机，通过触摸屏幕来巧妙而灵活的躲避下坠的方块。点触屏幕两侧，使方块旋转或扭曲，避开前进道路上的障碍物。即使开头很简单，最后可能很复杂。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日