We present VCNAC, a variable channel neural audio codec. Our approach features a single encoder and decoder parametrization that enables native inference for different channel setups, from mono speech to cinematic 5.1 channel surround audio. Channel compatibility objectives ensure that multi-channel content maintains perceptual quality when decoded to fewer channels. The shared representation enables training of generative language models on a single set of codebooks while supporting inference-time scalability across modalities and channel configurations. Evaluation using objective spatial audio metrics and subjective listening tests demonstrates that our unified approach maintains high reconstruction quality across mono, stereo, and surround audio configurations.
翻译:本文提出VCNAC,一种可变通道神经音频编解码器。该方法采用单一编码器与解码器参数化方案,能够原生支持从单声道语音到影院级5.1通道环绕声的不同通道配置推理。通道兼容性目标确保多通道内容在解码为较少通道时仍保持感知质量。共享表征使得生成式语言模型能够在单一码本集合上进行训练,同时支持跨模态与通道配置的推理时扩展性。通过客观空间音频指标与主观听觉测试的评估表明,该统一方法在单声道、立体声及环绕声配置下均保持高质量重建性能。