Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that are either very simple or highly complex. To address this limitation, we propose SwitchCodec, a neural audio codec based on Residual Experts Vector Quantization (REVQ). REVQ combines a shared quantizer with dynamically routed expert quantizers that are activated according to the input audio, decoupling bitrate from codebook capacity and improving compression efficiency. This design ensures full training and utilization of each quantizer. In addition, a variable-bitrate mechanism adjusts the number of active expert quantizers at inference, enabling multi-bitrate operation without retraining. Experiments demonstrate that SwitchCodec surpasses existing baselines on both objective metrics and subjective listening tests.
翻译:近年来,神经音频压缩模型常依赖残差向量量化实现高保真编码,但每帧使用固定数量的码本难以适应音频内容的高度可变性——特别是对于极其简单或高度复杂的信号。为突破此限制,我们提出SwitchCodec,一种基于残差专家向量量化的神经音频编解码器。该架构将共享量化器与根据输入音频动态路由激活的专家量化器相结合,实现了码本容量与比特率的解耦,提升了压缩效率。此设计确保每个量化器都能得到充分训练与利用。此外,通过推理时动态调整激活专家量化器数量的可变比特率机制,本方法无需重新训练即可实现多比特率操作。实验表明,SwitchCodec在客观指标与主观听感测试上均超越现有基线模型。