Compressing images at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. Existing extreme image compression methods generally suffer from heavy compression artifacts or low-fidelity reconstructions. To address this problem, we propose a novel extreme image compression framework that combines compressive VAEs and pre-trained text-to-image diffusion models in an end-to-end manner. Specifically, we introduce a latent feature-guided compression module based on compressive VAEs. This module compresses images and initially decodes the compressed information into content variables. To enhance the alignment between content variables and the diffusion space, we introduce external guidance to modulate intermediate feature maps. Subsequently, we develop a conditional diffusion decoding module that leverages pre-trained diffusion models to further decode these content variables. To preserve the generative capability of pre-trained diffusion models, we keep their parameters fixed and use a control module to inject content information. We also design a space alignment loss to provide sufficient constraints for the latent feature-guided compression module. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual performance and image fidelity at extremely low bitrates.
翻译:在极低比特率(低于0.1 bits per pixel (bpp))下压缩图像是一项重大挑战,原因在于严重的信息损失。现有的极低比特率图像压缩方法普遍存在严重压缩伪影或重建保真度低的问题。为解决这一难题,我们提出了一种新颖的极低比特率图像压缩框架,该框架以端到端方式融合了压缩变分自编码器与预训练的文本到图像扩散模型。具体而言,我们基于压缩变分自编码器引入了一个潜在特征引导压缩模块。该模块对图像进行压缩,并初步将压缩信息解码为内容变量。为增强内容变量与扩散空间之间的对齐程度,我们引入外部引导来调制中间特征图。随后,我们开发了一个条件扩散解码模块,利用预训练的扩散模型对这些内容变量进行进一步解码。为保持预训练扩散模型的生成能力,我们固定其参数,并使用控制模块注入内容信息。我们还设计了一种空间对齐损失函数,为潜在特征引导压缩模块提供充分约束。大量实验证明,在极低比特率下,我们的方法在视觉表现和图像保真度方面均优于现有最先进方法。