Learned image compression methods have shown superior rate-distortion performance and remarkable potential compared to traditional compression methods. Most existing learned approaches use stacked convolution or window-based self-attention for transform coding, which aggregate spatial information in a fixed range. In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, according to the recent progress of entropy model, we define a generalized coarse-to-fine entropy model, considering the coarse global context, the channel-wise, and the spatial context. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive global context. Furthermore, we propose an asymmetric spatial-channel entropy model according to the investigation of the spatial characteristics of the grouped latents. The asymmetric entropy model aims to reduce statistical redundancy while maintaining coding efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
翻译:学习型图像压缩方法相比传统压缩方法展现出更优的率失真性能和显著潜力。现有的大多数学习方法采用堆叠卷积或基于窗口的自注意力机制进行变换编码,这些方法在固定范围内聚合空间信息。本文聚焦于扩展空间聚合能力,提出了一种基于动态核的变换编码方案。所提出的自适应聚合方法通过生成核偏移量,在内容条件范围内捕获有效信息以辅助变换。结合自适应聚合策略与权重共享机制,本方法在可接受模型复杂度下实现了优异的变换能力。此外,基于熵模型的最新进展,我们定义了一种广义的由粗到细的熵模型,综合考虑了粗粒度全局上下文、通道上下文和空间上下文。在此基础上,我们将动态核引入超先验模块,以生成更具表达力的全局上下文。进一步地,通过研究分组潜变量的空间特性,我们提出了一种非对称空间-通道熵模型。该非对称熵模型旨在保持编码效率的同时减少统计冗余。实验结果表明,本方法在三个基准测试上相比最先进的学习型方法取得了更优的率失真性能。