Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
翻译:近年来,神经网络在低比特率语音编码任务中表现出有效性。然而,帧内相关性的欠利用以及量化器的误差会显著降低重构音频质量。为提升编码质量,我们提出一种端到端神经语音编解码器——CBRC(卷积双向递归神经编解码器)。通过设计一维卷积与内部双向循环神经网络(Intra-BRNN)的交错结构,可更高效地挖掘帧内相关性;同时,采用分组波束搜索残差矢量量化器(GB-RVQ)降低量化噪声。CBRC以20毫秒为间隔对音频进行编码,无额外延迟,适用于实时通信。实验结果表明,当3kbps码率的CBRC与12kbps码率的Opus对比时,本编解码器展现了显著优势。