A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbit/s.
翻译:本文提出并实现了一种针对三阶Ambisonics音频数据驱动压缩的多通道RVQGAN神经编码方法扩展。通过修改生成器和判别器模型的输入层与输出层,使其能够处理多通道(16通道)信号而无需增加模型比特率。同时,我们提出了一种考虑沉浸式再现中空间感知的损失函数,并实现了从单通道模型的迁移学习。在7.1.4沉浸式播放环境下的听音测试结果表明,所提出的扩展方法能以16 kbit/s的码率对基于场景的16通道Ambisonics内容进行高质量编码。