Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The proposed AP-BWE generator is entirely based on convolutional neural networks (CNNs). It features a dual-stream architecture with mutual interaction, where the amplitude stream and the phase stream communicate with each other and respectively extend the high-frequency components from the input narrowband amplitude and phase spectra. To improve the naturalness of the extended speech signals, we employ a multi-period discriminator at the waveform level and design a pair of multi-resolution amplitude and phase discriminators at the spectral level, respectively. Experimental results demonstrate that our proposed AP-BWE achieves state-of-the-art performance in terms of speech quality for BWE tasks targeting sampling rates of both 16 kHz and 48 kHz. In terms of generation efficiency, due to the all-convolutional architecture and all-frame-level operations, the proposed AP-BWE can generate 48 kHz waveform samples 292.3 times faster than real-time on a single RTX 4090 GPU and 18.1 times faster than real-time on a single CPU. Notably, to our knowledge, AP-BWE is the first to achieve the direct extension of the high-frequency phase spectrum, which is beneficial for improving the effectiveness of existing BWE methods.
翻译:语音带宽扩展(BWE)旨在拓宽语音信号的频率带宽范围,提升语音质量,使其更明亮、更饱满。本文提出了一种基于生成对抗网络(GAN)的BWE模型,该模型并行预测幅度谱与相位谱,命名为AP-BWE,能够实现高质量且高效的全频带语音波形生成。所提出的AP-BWE生成器完全基于卷积神经网络(CNN)。其采用具有相互交互作用的双流架构,其中幅度流与相位流相互通信,并分别从输入的窄带幅度谱与相位谱中扩展高频分量。为提升扩展后语音信号的自然度,我们在波形层级采用了多周期判别器,并在频谱层级分别设计了一对多分辨率幅度与相位判别器。实验结果表明,所提出的AP-BWE在面向16 kHz与48 kHz采样率的BWE任务中,在语音质量方面均达到了最先进的性能。在生成效率方面,得益于全卷积架构与全帧级操作,所提出的AP-BWE在单块RTX 4090 GPU上生成48 kHz波形样本的速度可比实时快292.3倍,在单CPU上可比实时快18.1倍。值得注意的是,据我们所知,AP-BWE首次实现了对高频相位谱的直接扩展,这有助于提升现有BWE方法的有效性。