In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds. CleanUNet 2 uses a two-stage framework inspired by popular speech synthesis methods that consist of a waveform model and a spectrogram model. Specifically, CleanUNet 2 builds upon CleanUNet, the state-of-the-art waveform denoiser, and further boosts its performance by taking predicted spectrograms from a spectrogram denoiser as the input. We demonstrate that CleanUNet 2 outperforms previous methods in terms of various objective and subjective evaluations.
翻译:本文提出CleanUNet 2,一种融合波形去噪器与频谱图去噪器优势的语音去噪模型,实现了两类方法的协同最优。CleanUNet 2采用受主流语音合成方法启发的两阶段框架,包含波形模型与频谱图模型。具体而言,CleanUNet 2以当前最优波形去噪器CleanUNet为基础,通过将频谱图去噪器生成的预测频谱图作为输入,进一步提升了模型性能。实验表明,CleanUNet 2在多种客观与主观评测指标上均优于现有方法。