Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat the latent representation of images in the diffusion space as guidance, employing a VAE-based compression approach to compress images and initially decode the compressed information into content variables. The second stage leverages pre-trained stable diffusion to reconstruct images under the guidance of content variables. Specifically, we introduce a small control module to inject content information while keeping the stable diffusion model fixed to maintain its generative capability. Furthermore, we design a space alignment loss to force the content variables to align with the diffusion space and provide the necessary constraints for optimization. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely low bitrates. The source code and trained models are available at https://github.com/huai-chang/DiffEIC.
翻译:在极低比特率(低于每像素0.1比特)下进行图像压缩是一项重大挑战,主要源于信息的严重损失。本研究提出了一种新颖的两阶段极端图像压缩框架,该框架利用预训练扩散模型的强大生成能力,在极低比特率下实现逼真的图像重建。在第一阶段,我们将扩散空间中图像的潜在表示作为引导,采用基于变分自编码器的压缩方法对图像进行压缩,并将压缩信息初步解码为内容变量。第二阶段则利用预训练的稳定扩散模型,在内容变量的引导下重建图像。具体而言,我们引入了一个小型控制模块来注入内容信息,同时保持稳定扩散模型固定不变,以维持其生成能力。此外,我们设计了一种空间对齐损失,以强制内容变量与扩散空间对齐,并为优化提供必要的约束。大量实验表明,在极低比特率下,我们的方法在视觉性能方面显著优于现有最先进的方法。源代码及训练模型可在 https://github.com/huai-chang/DiffEIC 获取。