Two-stage pipeline is popular in speech enhancement tasks due to its superiority over traditional single-stage methods. The current two-stage approaches usually enhance the magnitude spectrum in the first stage, and further modify the complex spectrum to suppress the residual noise and recover the speech phase in the second stage. The above whole process is performed in the short-time Fourier transform (STFT) spectrum domain. In this paper, we re-implement the above second sub-process in the short-time discrete cosine transform (STDCT) spectrum domain. The reason is that we have found STDCT performs greater noise suppression capability than STFT. Additionally, the implicit phase of STDCT ensures simpler and more efficient phase recovery, which is challenging and computationally expensive in the STFT-based methods. Therefore, we propose a novel two-stage framework called the STFT-STDCT spectrum fusion network (FDFNet) for speech enhancement in cross-spectrum domain. Experimental results demonstrate that the proposed FDFNet outperforms the previous two-stage methods and also exhibits superior performance compared to other advanced systems.
翻译:双阶段流水线因其优于传统单阶段方法的特性,在语音增强任务中得到广泛应用。当前双阶段方法通常在第一阶段增强幅度谱,并在第二阶段进一步修正复频谱以抑制残留噪声并恢复语音相位。上述全过程均在短时傅里叶变换(STFT)频谱域中执行。本文在短时离散余弦变换(STDCT)频谱域中重新实现了上述第二子过程。原因在于我们发现STDCT比STFT具有更强的噪声抑制能力。此外,STDCT的隐式相位特性确保了更简单高效的相位恢复,而基于STFT的方法中相位恢复既具挑战性又计算昂贵。因此我们提出一种新型跨频谱域双阶段框架——STFT-STDCT频谱融合网络(FDFNet)用于语音增强。实验结果表明,所提出的FDFNet不仅优于以往的双阶段方法,相比其他先进系统也展现出更优越的性能。