Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so. In this work, we complement PIT with adversarial losses but find it challenging with the standard formulation used in speech source separation. We overcome this challenge with a novel I-replacement context-based adversarial loss, and by training with multiple discriminators. Our experiments show that by simply improving the loss (keeping the same model and dataset) we obtain a non-negligible improvement of 1.4 dB SI-SNRi in the reverberant FUSS dataset. We also find adversarial PIT to be effective at reducing spectral holes, ubiquitous in mask-based separation models, which highlights the potential relevance of adversarial losses for source separation.
翻译:通用声音分离旨在分离包含任意类型声音的混合信号,置换不变训练(PIT)用于训练实现此功能的源无关模型。本文中,我们通过对抗损失补充了PIT方法,但发现直接应用语音源分离中使用的标准公式面临挑战。我们采用基于上下文替换的新型I-替换对抗损失,并结合多个判别器进行训练,从而克服了这一难题。实验表明,仅通过改进损失函数(保持模型和数据集不变),我们就使混响FUSS数据集的SI-SNRi指标提升了1.4 dB,这一改进不容忽视。我们还发现对抗性PIT能有效减少掩码类分离模型中普遍存在的频谱空洞问题,这突显了对抗损失在源分离领域的潜在应用价值。