This paper proposes Remixed2Remixed, a domain adaptation method for speech enhancement, which adopts Noise2Noise (N2N) learning to adapt models trained on artificially generated (out-of-domain: OOD) noisy-clean pair data to better separate real-world recorded (in-domain) noisy data. The proposed method uses a teacher model trained on OOD data to acquire pseudo-in-domain speech and noise signals, which are shuffled and remixed twice in each batch to generate two bootstrapped mixtures. The student model is then trained by optimizing an N2N-based cost function computed using these two bootstrapped mixtures. As the training strategy is similar to the recently proposed RemixIT, we also investigate the effectiveness of N2N-based loss as a regularization of RemixIT. Experimental results on the CHiME-7 unsupervised domain adaptation for conversational speech enhancement (UDASE) task revealed that the proposed method outperformed the challenge baseline system, RemixIT, and reduced the blurring of performance caused by teacher models.
翻译:本文提出Remixed2Remixed,一种面向语音增强的域自适应方法,该方法采用噪声对噪声(N2N)学习技术,使基于人工生成(域外:OOD)含噪-干净配对数据训练的模型能够更好地分离真实场景记录(域内)的含噪数据。所提方法利用在OOD数据上训练的教师模型获取伪域内语音与噪声信号,在每个批次中对这些信号进行两次混洗与重混合,生成两组自举混合信号。随后通过优化基于N2N的代价函数训练学生模型,该代价函数即由这两组自举混合信号计算得到。鉴于该训练策略与近期提出的RemixIT方法相似,本文进一步探究了N2N损失作为RemixIT正则化项的有效性。在CHiME-7无监督域自适应对话语音增强(UDASE)任务上的实验结果表明,所提方法优于挑战赛基线系统及RemixIT,同时缓解了教师模型导致的性能模糊问题。