Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.
翻译:语音带宽扩展对于扩展低带宽语音信号的频率范围至关重要,从而提升数字应用中的音频质量、清晰度与可感知性。其应用领域涵盖电话通信、压缩、文本转语音合成及语音识别。本文提出一种采用高保真生成对抗网络的新方法:与级联系统不同,我们的系统在配对的窄带与宽带语音信号上进行端到端训练。本方法将多种带宽上采样率整合至专为语音带宽扩展应用设计的单一统一模型中。我们的方法在不同带宽扩展因子(包括训练中未遇到的因子)上均表现出鲁棒性能,展示了零样本能力。据我们所知,这是首个展现该能力的研究工作。实验结果表明,本方法在性能上优于先前的端到端方法以及插值与传统技术,证明了其在实用语音增强应用中的有效性。