Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. However, their efficacy and applicability to achieve extreme compression ratios ($<0.05$ bpp) remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)--based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields a strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We propose clustering a pre-trained large-scale codebook into smaller codebooks through the K-means algorithm, yielding variable bitrates and different levels of reconstruction quality within the coding framework. Furthermore, we introduce a transformer to predict lost indices and restore images in unstable environments. Extensive qualitative and quantitative experiments on various benchmark datasets demonstrate that the proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception at extremely low bitrates ($\le 0.04$ bpp). Remarkably, even with the loss of up to $20\%$ of indices, the images can be effectively restored with minimal perceptual loss.
翻译:近期生成式压缩方法的进展在提升压缩数据感知质量方面取得了显著进步,尤其适用于低比特率场景。然而,这些方法在实现极限压缩比(<0.05 bpp)方面的有效性和适用性仍受限制。本文通过将基于矢量量化(VQ)的生成模型引入图像压缩领域,提出了一种简洁而高效的编码框架。其主要思想在于,VQGAN模型学习得到的码本具备强大的表达能力,能够在保持重建质量的同时,高效压缩潜在空间中的连续信息。具体而言,通过寻找最近码字,图像可表示为VQ索引,进而使用无损压缩方法将其编码为比特流。我们提出利用K均值算法,将预训练的大规模码本聚类为更小的码本,从而在编码框架内实现可变比特率和不同水平的重建质量。此外,我们引入Transformer来预测丢失的索引,并在不稳定环境下恢复图像。在多个基准数据集上进行的大量定性和定量实验证明,所提出的框架在极低比特率(≤0.04 bpp)下,在感知质量导向的指标和人类感知方面均优于现有最先进的编解码器。值得注意的是,即使丢失高达20%的索引,图像也能以极小的感知损失得到有效恢复。