With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of clean speech, underexploiting the varying noise information in real world. In this paper, we propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model. Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner to guide the reverse denoising process. Meanwhile, a multi-task learning scheme is devised to jointly optimize SE and NC tasks to enhance the noise specificity of conditioner. NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models. Experiments on VB-DEMAND dataset show that NASE effectively improves multiple mainstream diffusion SE models, especially on unseen noises.
翻译:随着扩散模型的最新进展,生成式语音增强因其在处理未知测试噪声方面的巨大潜力而引发了广泛的研究兴趣。然而,现有工作主要关注纯净语音的内在特性,未能充分利用现实世界中多变的噪声信息。本文提出一种噪声感知语音增强方法,该方法提取噪声特异性信息来指导扩散模型中的反向过程。具体而言,我们设计了一个噪声分类模型,用于生成声学嵌入作为噪声条件器,以指导反向去噪过程。同时,我们设计了一种多任务学习方案,以联合优化语音增强和噪声分类任务,从而增强条件器的噪声特异性。NASE被证明是一个即插即用模块,可以泛化到任何扩散语音增强模型。在VB-DEMAND数据集上的实验表明,NASE有效提升了多种主流扩散语音增强模型的性能,尤其是在处理未知噪声方面。