Speech super-resolution (SSR) aims to predict a high resolution (HR) speech signal from its low resolution (LR) corresponding part. Most neural SSR models focus on producing the final result in a noise-free environment by recovering the spectrogram of high-frequency part of the signal and concatenating it with the original low-frequency part. Although these methods achieve high accuracy, they become less effective when facing the real-world scenario, where unavoidable noise is present. To address this problem, we propose a Super Denoise Net (SDNet), a neural network for a joint task of super-resolution and noise reduction from a low sampling rate signal. To that end, we design gated convolution and lattice convolution blocks to enhance the repair capability and capture information in the time-frequency axis, respectively. The experiments show our method outperforms baseline speech denoising and SSR models on DNS 2020 no-reverb test set with higher objective and subjective scores.
翻译:语音超分辨率(SSR)旨在从其低分辨率(LR)对应部分预测高分辨率(HR)语音信号。多数神经SSR模型专注于在无噪声环境中通过恢复信号高频部分的语谱图并将其与原始低频部分拼接来生成最终结果。尽管这些方法实现了高精度,但在面对现实场景中不可避免的噪声时效果会降低。为解决此问题,我们提出了超降噪网络(SDNet),一种用于从低采样率信号中联合执行超分辨率和降噪任务的神经网络。为此,我们设计了门控卷积块和格点卷积块,分别增强修复能力并捕捉时频轴上的信息。实验表明,我们的方法在DNS 2020无混响测试集上,以更高的客观和主观评分优于基线语音降噪和SSR模型。