Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, termed Control-GIC, the first capable of fine-grained bitrate adaption across a broad spectrum while ensuring high-fidelity and generality compression. Control-GIC is grounded in a VQGAN framework that encodes an image as a sequence of variable-length codes (i.e. VQ-indices), which can be losslessly compressed and exhibits a direct positive correlation with the bitrates. Drawing inspiration from the classical coding principle, we correlate the information density of local image patches with their granular representations. Hence, we can flexibly determine a proper allocation of granularity for the patches to achieve dynamic adjustment for VQ-indices, resulting in desirable compression rates. We further develop a probabilistic conditional decoder capable of retrieving historic encoded multi-granularity representations according to transmitted codes, and then reconstruct hierarchical granular features in the formalization of conditional probability, enabling more informative aggregation to improve reconstruction realism. Our experiments show that Control-GIC allows highly flexible and controllable bitrate adaption where the results demonstrate its superior performance over recent state-of-the-art methods.
翻译:尽管近期的生成式图像压缩方法在优化率失真感知权衡方面展现出巨大潜力,但其仍面临关键挑战:如何灵活适应不同压缩需求和场景的码率调整。为克服此挑战,本文提出了一种可控生成式图像压缩框架,称为Control-GIC,这是首个能够在宽泛码率范围内实现细粒度适配,同时确保高保真度与通用压缩性能的方法。Control-GIC基于VQGAN框架构建,该框架将图像编码为可变长度码序列(即VQ索引),这些序列可进行无损压缩且与码率呈直接正相关。受经典编码原理启发,我们将局部图像块的信息密度与其粒度表示相关联。因此,我们可以灵活确定各图像块的合适粒度分配,从而实现VQ索引的动态调整,获得理想的压缩率。我们进一步开发了概率条件解码器,能够根据传输的码字检索历史编码的多粒度表示,随后以条件概率形式重建分层粒度特征,实现更具信息量的聚合以提升重建真实感。实验表明,Control-GIC支持高度灵活可控的码率适配,其结果证明了其性能优于当前最先进方法。