Reducing the bandwidth of speech is common practice in resource constrained environments like low-bandwidth speech transmission or low-complexity vocoding. We propose a lightweight and robust method for extending the bandwidth of wideband speech signals that is inspired by classical methods developed in the speech coding context. The resulting model has just $\sim 370$~K parameters and a complexity of ~140 MFLOPS (or ~70 MMACS). With a frame size of 10 ms and a lookahead of just 0.27 ms the model is well-suited for common wideband speech codecs. We evaluate the model's robustness by pairing it with the Opus SILK speech codec (1.5 release) and verify in a P.808 DCR listening test that it significantly improves quality from 6 to 12 kb/s. We also demonstrate that Opus 1.5 together with the proposed bandwidth extension at 9 kb/s meets the quality of 3GPP EVS at 9.6 kb/s and that of Opus 1.4 at 18 kb/s showing that the blind bandwidth extension can meet the quality of classical guided bandwidth extensions.
翻译:在资源受限环境中,如低带宽语音传输或低复杂度声码器,降低语音带宽是常见做法。我们提出一种轻量级且鲁棒的宽带语音信号带宽扩展方法,其灵感来源于语音编码领域开发的经典方法。所得模型仅包含约370K参数,计算复杂度约为140 MFLOPS(或约70 MMACS)。该模型采用10毫秒帧长和仅0.27毫秒的前瞻时间,非常适合常见的宽带语音编解码器。我们通过将其与Opus SILK语音编解码器(1.5版本)结合来评估模型的鲁棒性,并在P.808 DCR听力测试中验证其在6至12 kb/s码率下显著提升语音质量。我们还证明,在9 kb/s码率下,Opus 1.5结合所提出的带宽扩展技术,其质量可达到9.6 kb/s的3GPP EVS标准及18 kb/s的Opus 1.4标准,这表明盲带宽扩展技术能够达到经典引导式带宽扩展的质量水平。