Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in preserving and reconstructing fine details for optimal reconstruction, especially at low bitrates. In this study, we introduce SuperCodec, a neural speech codec that achieves state-of-the-art performance at low bitrates. It employs a novel back projection method with selective feature fusion for augmented representation. Specifically, we propose to use Selective Up-sampling Back Projection (SUBP) and Selective Down-sampling Back Projection (SDBP) modules to replace the standard up- and down-sampling layers at the encoder and decoder, respectively. Experimental results show that our method outperforms the existing neural speech codecs operating at various bitrates. Specifically, our proposed method can achieve higher quality reconstructed speech at 1 kbps than Lyra V2 at 3.2 kbps and Encodec at 6 kbps.
翻译:神经语音编码是一个快速发展的领域,当前最先进的方法已展现出优于传统方法的压缩性能。尽管取得了显著进展,现有方法在保留和重建精细细节以实现最优重建方面仍存在局限,尤其是在低比特率下。在本研究中,我们提出了SuperCodec,一种在低比特率下实现最先进性能的神经语音编解码器。它采用了一种新颖的反向投影方法,结合选择性特征融合以增强表示。具体而言,我们提出使用选择性上采样反向投影(SUBP)和选择性下采样反向投影(SDBP)模块,分别替代编码器和解码器中的标准上采样和下采样层。实验结果表明,我们的方法优于现有在各种比特率下运行的神经语音编解码器。具体来说,我们提出的方法在1 kbps比特率下重建的语音质量,高于Lyra V2在3.2 kbps以及Encodec在6 kbps下的表现。