We present Self-Remixing, a novel self-supervised speech separation method, which refines a pre-trained separation model in an unsupervised manner. The proposed method consists of a shuffler module and a solver module, and they grow together through separation and remixing processes. Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing the separated signals. The solver then separates the pseudo-mixtures and remixes the separated signals back to the observed mixtures. The solver is trained using the observed mixtures as supervision, while the shuffler's weights are updated by taking the moving average with the solver's, generating the pseudo-mixtures with fewer distortions. Our experiments demonstrate that Self-Remixing gives better performance over existing remixing-based self-supervised methods with the same or less training costs under unsupervised setup. Self-Remixing also outperforms baselines in semi-supervised domain adaptation, showing effectiveness in multiple setups.
翻译:我们提出自混音(Self-Remixing),一种新型自监督语音分离方法,能够以无监督方式优化预训练的分离模型。该方法由混洗模块与求解模块构成,两者通过分离与重混过程协同进化。具体而言,混洗模块首先分离观测混合信号,通过混洗并重混分离信号生成伪混合信号;随后求解模块分离伪混合信号,并将分离信号重构回观测混合信号。求解模块以观测混合信号作为监督信号进行训练,而混洗模块的权重通过求解模块的滑动平均值更新,从而生成失真更少的伪混合信号。实验表明,在无监督设定下,自混音方法以相同或更低的训练成本,性能优于现有基于重混的自监督方法。在半监督域适应场景中,自混音同样优于基线方法,证实了其在多种设定下的有效性。