Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm. However, such a trained-well transformation is vulnerable to unseen noises that are not included in training set. In this work, we focus on the unsupervised noise adaptation problem in speech enhancement, where the ground truth of target domain data is completely unavailable. Specifically, we propose a generative adversarial network based method to efficiently learn a converse clean-to-noisy transformation using a few minutes of unpaired target domain data. Then this transformation is utilized to generate sufficient simulated data for domain adaptation of the enhancement model. Experimental results show that our method effectively mitigates the domain mismatch between training and test sets, and surpasses the best baseline by a large margin.
翻译:基于深度神经网络的语音增强方法旨在通过监督学习范式学习从含噪到干净的变换。然而,这种经过良好训练的变换对训练集中未包含的未知噪声较为脆弱。本研究聚焦于语音增强中的无监督噪声自适应问题,其中目标域数据的真实标签完全不可用。具体而言,我们提出了一种基于生成对抗网络的方法,利用数分钟未配对的目标域数据高效学习从干净到含噪的反向变换。随后,利用该变换生成充足的模拟数据以进行增强模型的域自适应。实验结果表明,该方法有效缓解了训练集与测试集之间的域不匹配问题,并以较大幅度超越了最佳基线方法。