Recent advancements in deep learning-based image compression are notable. However, prevalent schemes that employ a serial context-adaptive entropy model to enhance rate-distortion (R-D) performance are markedly slow. Furthermore, the complexities of the encoding and decoding networks are substantially high, rendering them unsuitable for some practical applications. In this paper, we propose two techniques to balance the trade-off between complexity and performance. First, we introduce two branching coding networks to independently learn a low-resolution latent representation and a high-resolution latent representation of the input image, discriminatively representing the global and local information therein. Second, we utilize the high-resolution latent representation as conditional information for the low-resolution latent representation, furnishing it with global information, thus aiding in the reduction of redundancy between low-resolution information. We do not utilize any serial entropy models. Instead, we employ a parallel channel-wise auto-regressive entropy model for encoding and decoding low-resolution and high-resolution latent representations. Experiments demonstrate that our method is approximately twice as fast in both encoding and decoding compared to the parallelizable checkerboard context model, and it also achieves a 1.2% improvement in R-D performance compared to state-of-the-art learned image compression schemes. Our method also outperforms classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods in rate-distortion performance, as validated by both PSNR and MS-SSIM metrics on the Kodak dataset.
翻译:近年来,基于深度学习的图像压缩技术取得了显著进展。然而,现有主流方案采用串行上下文自适应熵模型提升率失真(R-D)性能,导致编码速度显著降低。同时,编码与解码网络的复杂度极高,难以适应某些实际应用场景。本文提出两种技术手段以平衡复杂度与性能的权衡:首先,构建双分支编码网络,分别独立学习输入图像的低分辨率隐层表征与高分辨率隐层表征,从而差异化地表示其中的全局与局部信息。其次,将高分辨率隐层表征作为低分辨率隐层表征的条件信息,为其提供全局信息,有助于减少低分辨率信息间的冗余度。我们避免使用任何串行熵模型,转而采用并行逐通道自回归熵模型对低分辨率与高分辨率隐层表征进行编解码。实验表明,与可并行化棋盘格上下文模型相比,本方法的编解码速度提升约两倍,同时相对当前最优的学习型图像压缩方案在率失真性能上实现1.2%的提升。在Kodak数据集上,本方法基于PSNR和MS-SSIM指标的验证结果亦优于H.266/VVC-intra(4:4:4)等经典图像编码器及近期部分学习型方法。