Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significant higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive ex-periments demonstrate the advantage of our proposed model. A multi-modal speech extraction system based on the proposed backbone model ranks first in the ICASSP 2024 MISP Challenge: https://mispchallenge.github.io/mispchallenge2023/index.html.
翻译:尽管过去十年取得了显著进展,基于深度神经网络(DNN)的语音增强(SE)在低信噪比(SNR)条件下仍面临恢复语音质量显著下降的挑战。本文提出一种用于低信噪比语音增强的信噪比渐进式谐波补偿模型。可靠的基频估计从中间输出获得,其优势在于比粗略估计保留了更多语音成分,同时信噪比显著高于输入带噪语音。模型引入了有效的谐波补偿机制以实现更好的谐波恢复。大量实验证明了所提模型的优势。基于该骨干模型构建的多模态语音提取系统在ICASSP 2024 MISP挑战赛中位列第一:https://mispchallenge.github.io/mispchallenge2023/index.html。