Diffusion probabilistic models have achieved mainstream success in many generative modeling tasks, from image generation to inverse problem solving. A distinct feature of these models is that they correspond to deep hierarchical latent variable models optimizing a variational evidence lower bound (ELBO) on the data likelihood. Drawing on a basic connection between likelihood modeling and compression, we explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded with progressively improving reconstruction quality. Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process, whose negative ELBO corresponds to the end-to-end compression cost using universal quantization. We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model, bringing neural codecs a step closer to practical deployment.
翻译:扩散概率模型已在许多生成建模任务中取得主流成功,从图像生成到逆问题求解。这些模型的一个显著特征是它们对应于深度分层隐变量模型,可优化数据似然的变分证据下界(ELBO)。基于似然建模与压缩之间的基本联系,我们探索了扩散模型在渐进式编码中的潜力,从而生成可增量传输并按比特序列逐步解码以持续提升重建质量的方案。与先前基于高斯扩散或条件扩散模型的研究不同,我们提出了一种前向过程采用均匀噪声的新型扩散模型,其负ELBO对应于使用通用量化的端到端压缩成本。我们在图像压缩领域获得了初步的积极成果,通过单一模型在广泛比特率范围内实现了具有竞争力的率失真与率真实度性能,使神经编解码器向实际部署迈进了一步。