Neural Audio Codecs (NACs) can reduce transmission overhead by performing compact compression and reconstruction, which also aim to bridge the gap between continuous and discrete signals. Existing NACs can be divided into two categories: multi-codebook and single-codebook codecs. Multi-codebook codecs face challenges such as structural complexity and difficulty in adapting to downstream tasks, while single-codebook codecs, though structurally simpler, suffer from low-fidelity, ineffective modeling of unified audio, and an inability to support modeling of high-frequency audio. We propose the UniSRCodec, a single-codebook codec capable of supporting high sampling rate, low-bandwidth, high fidelity, and unified. We analyze the inefficiency of waveform-based compression and introduce the time and frequency compression method using the Mel-spectrogram, and cooperate with a Vocoder to recover the phase information of the original audio. Moreover, we propose a sub-band reconstruction technique to achieve high-quality compression across both low and high frequency bands. Subjective and objective experimental results demonstrate that UniSRCodec achieves state-of-the-art (SOTA) performance among cross-domain single-codebook codecs with only a token rate of 40, and its reconstruction quality is comparable to that of certain multi-codebook methods. Our demo page is available at https://wxzyd123.github.io/unisrcodec.
翻译:神经音频编解码器(NACs)通过执行紧凑的压缩与重建来降低传输开销,其目标也在于弥合连续信号与离散信号之间的差距。现有NACs可分为两类:多码本编解码器与单码本编解码器。多码本编解码器面临结构复杂、难以适配下游任务等挑战;而单码本编解码器虽然结构更简单,但存在保真度低、对统一音频建模效果不佳以及无法支持高频音频建模等问题。我们提出了UniSRCodec,一种能够支持高采样率、低带宽、高保真度且具有统一性的单码本编解码器。我们分析了基于波形的压缩方法的低效性,引入了使用梅尔频谱图进行时频压缩的方法,并配合声码器来恢复原始音频的相位信息。此外,我们提出了一种子带重建技术,以实现跨低频与高频段的高质量压缩。主观与客观实验结果表明,UniSRCodec在仅需40令牌率的情况下,即在跨域单码本编解码器中达到了最先进的性能,其重建质量可与某些多码本方法相媲美。我们的演示页面位于 https://wxzyd123.github.io/unisrcodec。