UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate and computational complexity. Most conventional and neural speech codecs operate on wideband (WB) speech signals to achieve this compromise. To further enhance the perceptual quality of coded speech, bandwidth extension (BWE) of the transmitted speech is an attractive and popular technique in conventional speech coding. In contrast, neural speech codecs are typically trained end-to-end to a specific set of requirements and are often not easily adaptable. In particular, they are typically trained to operate at a single fixed sampling rate. With the Universal Bandwidth Extension Generative Adversarial Network (UBGAN), we propose a modular and lightweight GAN-based solution that increases the operational flexibility of a wide range of conventional and neural codecs. Our model operates in the subband domain and extends the bandwidth of WB signals from 8 kHz to 16 kHz, resulting in super-wideband (SWB) signals. We further introduce two variants, guided-UBGAN and blind-UBGAN, where the guided version transmits quantized learned representation as a side information at a very low bitrate additional to the bitrate of the codec, while blind-BWE operates without such side-information. Our subjective assessments demonstrate the advantage of UBGAN applied to WB codecs and highlight the generalization capacity of our proposed method across multiple codecs and bitrates.

翻译：在语音编解码器的实际应用中，多种因素（如无线连接质量、硬件限制或所需的用户体验）使得可实现的感知质量、产生的比特率与计算复杂度之间必须进行权衡。大多数传统与神经语音编解码器均基于宽带（WB）语音信号运作以实现这一平衡。为进一步提升编码语音的感知质量，在传统语音编码中，对传输语音进行带宽扩展（BWE）是一种极具吸引力且广泛采用的技术。相比之下，神经语音编解码器通常针对特定需求进行端到端训练，其适应性往往较差；尤其常见的是，它们通常仅针对单一固定采样率进行训练。本文提出的通用带宽扩展生成对抗网络（UBGAN）提供了一种模块化、轻量级的基于GAN的解决方案，可显著提升多种传统与神经编解码器的操作灵活性。我们的模型在子带域运行，将宽带信号的带宽从8 kHz扩展至16 kHz，从而生成超宽带（SWB）信号。我们进一步引入了两种变体：引导式UBGAN与盲式UBGAN。其中引导式版本在编解码器原有比特率基础上，额外以极低比特率传输量化后的学习表示作为边信息；而盲式BWE则无需此类边信息即可运行。我们的主观评估结果表明，UBGAN应用于宽带编解码器具有显著优势，并凸显了所提方法在多种编解码器及比特率下的泛化能力。