Neural image compression has made a great deal of progress. State-of-the-art models are based on variational autoencoders and are outperforming classical models. Neural compression models learn to encode an image into a quantized latent representation that can be efficiently sent to the decoder, which decodes the quantized latent into a reconstructed image. While these models have proven successful in practice, they lead to sub-optimal results due to imperfect optimization and limitations in the encoder and decoder capacity. Recent work shows how to use stochastic Gumbel annealing (SGA) to refine the latents of pre-trained neural image compression models. We extend this idea by introducing SGA+, which contains three different methods that build upon SGA. We show how our method improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors. Additionally, we show how refinement of the latents with our best-performing method improves the compression performance on both the Tecnick and CLIC dataset. Our method is deployed for a pre-trained hyperprior and for a more flexible model. Further, we give a detailed analysis of our proposed methods and show that they are less sensitive to hyperparameter choices. Finally, we show how each method can be extended to three- instead of two-class rounding.
翻译:神经图像压缩已取得重大进展。基于变分自编码器的先进模型正超越传统方法。神经压缩模型学习将图像编码为量化潜表示,该表示可高效传输至解码器,后者将量化潜变量解码为重建图像。尽管这些模型在实践中已获成功,但由于优化不完善及编解码器容量限制,仍会导致次优结果。近期研究展示了如何利用随机Gumbel退火(SGA)优化预训练神经图像压缩模型的潜变量。我们通过引入SGA+扩展了这一思路,该方法包含三种基于SGA的改进技术。实验表明,相较于原有方法,我们的方案在率失真权衡方面提升了整体压缩性能。此外,我们证明了采用最佳性能方法优化潜变量后,在Tecnick和CLIC数据集上的压缩性能均获得改善。本方法已部署于预训练超先验模型及更灵活的模型架构。进一步地,我们对所提方法进行了详细分析,证明其具有更低的超参数敏感性。最后,我们展示了每种方法均可扩展至三分类舍入(而非传统的二分类舍入)方案。