Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two techniques to balance the trade-off between complexity and performance. First, we introduce two branching coding networks to independently learn a low-resolution latent representation and a high-resolution latent representation of the input image, discriminatively representing the global and local information therein. Second, we utilize the high-resolution latent representation as conditional information for the low-resolution latent representation, furnishing it with global information, thus aiding in the reduction of redundancy between low-resolution information. We do not utilize any serial entropy models. Instead, we employ a parallel channel-wise auto-regressive entropy model for encoding and decoding low-resolution and high-resolution latent representations. Experiments demonstrate that our method is approximately twice as fast in both encoding and decoding compared to the parallelizable checkerboard context model, and it also achieves a 1.2% improvement in R-D performance compared to state-of-the-art learned image compression schemes. Our method also outperforms classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods in rate-distortion performance, as validated by both PSNR and MS-SSIM metrics on the Kodak dataset.
翻译:近年来,基于深度学习的图像压缩技术取得了显著进展。然而,当前大多数采用串行上下文自适应熵模型以提升率失真性能的方案存在明显的处理速度瓶颈。此外,编码与解码网络的复杂度极高,导致其在部分实际应用场景中难以部署。为有效平衡复杂度与性能的权衡,本文提出两项关键技术:首先,设计双分支编码网络分别独立学习输入图像的低分辨率潜在表征与高分辨率潜在表征,从而差异化地表达其中的全局信息与局部信息;其次,将高分辨率潜在表征作为低分辨率潜在表征的条件信息,为其提供全局引导,进而有效降低低分辨率信息间的冗余。本文未采用任何串行熵模型,而是对低分辨率与高分辨率潜在表征均采用并行通道自回归熵模型进行编解码。实验结果表明,与可并行化的棋盘格上下文模型相比,本方法在编码与解码速度上均提升约两倍,且相较于当前最优的学习型图像压缩方案,率失真性能提升1.2%。在Kodak数据集上以PSNR与MS-SSIM指标评估,本方法在率失真性能上不仅超越经典图像编解码标准H.266/VVC帧内模式(4:4:4),亦优于近年来若干学习型压缩方法。