The goal of single-channel source separation is to reconstruct $K$ sources given their mixture. In supervised settings where vast amounts of clean source data are available, this challenging, ill-posed problem has been addressed successfully by generative diffusion and flow-based prior models. However, access to such clean source samples is often limited, and even when available, supervised models are vulnerable to domain shifts. To bridge this gap, we present Separation via Unsupervised Remixing Flow (SURF), an unsupervised flow matching approach for source separation that learns directly from observed mixtures. This method relies on a novel combination of state-of-the-art supervised flow matching and regression-based self-supervised techniques. At a high level, starting from a teacher model, we utilize a "remixing" step to bootstrap the learning of a student flow model from the teacher's estimates. We provide insights into the objectives optimized by this approach and draw a novel connection to the Wake-Sleep algorithm. Empirical evaluations on image and audio benchmarks demonstrate that SURF establishes a new state-of-the-art, significantly outperforming existing unsupervised methods. See our demo page for examples. https://google.github.io/df-conformer/surf/
翻译:单通道源分离的目标是在给定混合信号的情况下重建$K$个源信号。在拥有大量干净源数据的监督场景中,这一具有挑战性的病态问题已通过生成扩散和基于流的先验模型成功解决。然而,获取此类干净源样本通常受限,且即使存在可用数据,监督模型也易受领域偏移影响。为弥合这一差距,我们提出无监督重混流分离方法(SURF),这是一种直接从观测混合信号中学习的无监督流匹配源分离方法。该方法创新性地融合了最先进的监督流匹配与基于回归的自监督技术。从高层来看,我们以教师模型为起点,利用“重混”步骤从教师模型的估计中引导出学生流模型的学习。我们揭示了此方法优化的目标函数,并建立了与Wake-Sleep算法的新颖联系。在图像和音频基准上的实验表明,SURF实现了新的最优性能,显著超越现有无监督方法。示例请参见演示页面:https://google.github.io/df-conformer/surf/