Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized image representation, as well as a global image description to provide additional context. We dub our model PerCo for 'perceptual compression', and compare it to state-of-the-art codecs at rates from 0.1 down to 0.003 bits per pixel. The latter rate is more than an order of magnitude smaller than those considered in most prior work, compressing a 512x768 Kodak image with less than 153 bytes. Despite this ultra-low bitrate, our approach maintains the ability to reconstruct realistic images. We find that our model leads to reconstructions with state-of-the-art visual quality as measured by FID and KID. As predicted by rate-distortion-perception theory, visual quality is less dependent on the bitrate than previous methods.
翻译:图像编解码器通常通过权衡码率与失真指标进行优化。在低码率下,即使采用感知损失或对抗性损失训练,仍会产生易察觉的压缩伪影。为提升图像质量并消除对码率的依赖,我们提出利用迭代扩散模型进行解码。我们将解码过程建立在矢量量化图像表示与全局图像描述之上,以提供额外的上下文信息。我们将模型命名为PerCo(感知压缩),并与码率从0.1至0.003比特/像素区间的最先进编解码器进行对比。后者比大多数先前研究所考虑的码率低一个数量级以上,可将512×768的Kodak图像压缩至不足153字节。尽管码率极低,我们的方法仍能保持重建真实图像的能力。实验表明,该模型在FID和KID指标上实现了视觉质量最优的重建。正如率失真感知理论所预测的,视觉质量对码率的依赖程度显著低于传统方法。