In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance. To overcome these limitations, we propose a novel approach called Restorative SE (RestSE), which combines a lightweight SE module with a generative codec module to progressively enhance and restore speech quality. The SE module initially reduces noise, while the codec module subsequently performs dereverberation and restores speech using generative capabilities. We systematically explore various quantization techniques within the codec module to optimize performance. Additionally, we introduce a weighted loss function and feature fusion that merges the SE output with the original mixture, particularly at segments where the SE output is heavily distorted. Experimental results demonstrate the effectiveness of our proposed method in enhancing speech quality under adverse conditions. Audio demos are available at: https://sophie091524.github.io/RestorativeSE/.
翻译:在具有显著噪声和混响的挑战性环境中,传统语音增强(SE)方法常导致语音过度抑制,产生听觉伪影并损害下游任务性能。为克服这些局限,我们提出一种名为恢复性语音增强(RestSE)的新方法,该方法将轻量级SE模块与生成式编解码器模块相结合,以渐进方式提升并恢复语音质量。SE模块首先进行降噪处理,随后编解码器模块利用其生成能力执行去混响并恢复语音。我们系统探索了编解码器模块内的多种量化技术以优化性能。此外,我们引入加权损失函数及特征融合机制,将SE输出与原始混合信号在SE输出严重失真的片段进行融合。实验结果证明,所提方法在恶劣条件下对语音质量的提升具有显著效果。音频示例发布于:https://sophie091524.github.io/RestorativeSE/。