Recently, there are significant advancements in learning-based image compression methods surpassing traditional coding standards. Most of them prioritize achieving the best rate-distortion performance for a particular compression rate, which limits their flexibility and adaptability in various applications with complex and varying constraints. In this work, we explore the potential of resolution fields in scalable image compression and propose the reciprocal pyramid network (RPN) that fulfills the need for more adaptable and versatile compression. Specifically, RPN first builds a compression pyramid and generates the resolution fields at different levels in a top-down manner. The key design lies in the cross-resolution context mining module between adjacent levels, which performs feature enriching and distillation to mine meaningful contextualized information and remove unnecessary redundancy, producing informative resolution fields as residual priors. The scalability is achieved by progressive bitstream reusing and resolution field incorporation varying at different levels. Furthermore, between adjacent compression levels, we explicitly quantify the aleatoric uncertainty from the bottom decoded representations and develop an uncertainty-guided loss to update the upper-level compression parameters, forming a reverse pyramid process that enforces the network to focus on the textured pixels with high variance for more reliable and accurate reconstruction. Combining resolution field exploration and uncertainty guidance in a pyramid manner, RPN can effectively achieve spatial and quality scalable image compression. Experiments show the superiority of RPN against existing classical and deep learning-based scalable codecs. Code will be available at https://github.com/JGIroro/RPNSIC.
翻译:近期,基于学习的图像压缩方法取得显著进展,已超越传统编码标准。然而,大多数方法追求在特定压缩率下实现最优率失真性能,这限制了其在复杂多变约束场景中的灵活性与适应性。本文探索了分辨率场在可扩展图像压缩中的潜力,并提出满足更高适应性与通用性需求的倒置金字塔网络(RPN)。具体而言,RPN首先构建压缩金字塔,并采用自顶向下方式生成不同层级的分辨率场。其核心设计在于相邻层级间的跨分辨率上下文挖掘模块,该模块通过特征增强与蒸馏提取有意义的上下文信息并消除冗余,从而生成蕴含残差先验信息的分辨率场。通过渐进式比特流复用与不同层级分辨率场的差异化融合,实现了可扩展性。此外,在相邻压缩层级之间,我们显式量化了底层解码表征中的偶然不确定性,并设计不确定性引导损失函数来更新上层压缩参数,形成反向金字塔流程,强制网络聚焦于纹理像素区域(高方差区域)以实现更可靠的重建。通过将分辨率场探索与不确定性引导以金字塔方式结合,RPN可有效实现空间与质量可扩展的图像压缩。实验表明,RPN优于现有经典及基于深度学习的可扩展编解码器。代码将发布于 https://github.com/JGIroro/RPNSIC。