High-fidelity general audio compression at ultra-low bitrates is crucial for applications ranging from low-bandwidth communication to generative audio-language modeling. Traditional audio compression methods and contemporary neural codecs are fundamentally designed for waveform reconstruction. As a result, when operating at ultra-low bitrates, these methods degrade rapidly and often fail to preserve essential information, leading to severe acoustic artifacts and pronounced semantic distortion. To overcome these limitations, we introduce Generative Audio Compression (GAC), a novel paradigm shift from signal fidelity to task-oriented effectiveness. Implemented within the AI Flow framework, GAC is theoretically grounded in the Law of Information Capacity. These foundations posit that abundant computational power can be leveraged at the receiver to offset extreme communication bottlenecks--exemplifying the More Computation, Less Bandwidth philosophy. By integrating semantic understanding at the transmitter with scalable generative synthesis at the receiver, GAC offloads the information burden to powerful model priors. Our 1.8B-parameter model achieves high-fidelity reconstruction of 32kHz general audio at an unprecedented bitrate of 0.275kbps. Even at 0.175kbps, it still preserves a strong intelligible audio transmission capability, which represents an about 3000x compression ratio, significantly outperforming current state-of-the-art neural codecs in maintaining both perceptual quality and semantic consistency.
翻译:在超低比特率下实现高保真通用音频压缩,对于从低带宽通信到生成式音频-语言建模等一系列应用至关重要。传统的音频压缩方法与当代神经编解码器本质上均以波形重建为设计目标。因此,在超低比特率下运行时,这些方法性能迅速下降,往往无法保留关键信息,导致严重的声学伪影和明显的语义失真。为克服这些局限,我们提出了生成式音频压缩(Generative Audio Compression, GAC),这是一种从信号保真度转向任务导向有效性的新范式。GAC在AI Flow框架内实现,其理论基础是信息容量定律。该理论认为,接收端可利用充足的计算能力来抵消极端的通信瓶颈——这体现了“更多计算,更少带宽”的理念。通过在发送端集成语义理解,并在接收端采用可扩展的生成式合成,GAC将信息负担卸载至强大的模型先验中。我们提出的18亿参数模型,在0.275kbps这一前所未有的比特率下,实现了32kHz通用音频的高保真重建。即使在0.175kbps的比特率下,该模型仍能保持强大的可理解音频传输能力,这相当于约3000倍的压缩比,在保持感知质量与语义一致性方面显著优于当前最先进的神经编解码器。