Effective Receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed at most during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERF remains not large enough instead, or heavy non-local attention mechanisms, which limit the potential of high resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in learned image compression community, we introduce a few large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to wide range of image diversity, we propose to enhance the adaptability of convolutions via generating weights in a self-conditioned manner. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. We also investigate improved training techniques to fully exploit the potential of large kernels. In addition, to enhance the interactions among channels, we propose the adaptive channel-wise bit allocation via generating channel importance factor in a self-conditioned manner. To demonstrate the effectiveness of proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, LLIC-TCM. Extensive experiments demonstrate our proposed LLIC models have significant improvements over corresponding baselines and achieve state-of-the-art performances and better trade-off between performance and complexity.
翻译:有效感受野(ERF)在变换编码中起着重要作用,它决定了变换过程中最多能消除多少冗余,以及在逆变换过程中能利用多少空间先验来合成纹理。现有方法依赖小核堆叠(其ERF仍不够大)或沉重的非局部注意力机制,这限制了高分辨率图像编码的潜力。为解决此问题,我们提出了基于自适应权重的大感受野变换编码用于学习图像压缩(LLIC)。具体而言,我们首次在学习图像压缩领域引入少量基于大核的深度卷积,在保持适度复杂度的同时减少更多冗余。鉴于图像多样性的广泛范围,我们提出通过自条件方式生成权重以增强卷积的自适应性。大核与非线性嵌入及门控机制协同工作,以提升表达能力并简化逐点交互。我们还研究了改进的训练技术以充分挖掘大核的潜力。此外,为增强通道间的交互,我们通过自条件方式生成通道重要性因子,提出了自适应通道级比特分配。为证明所提变换编码的有效性,我们对齐熵模型以与现有变换方法对比,获得模型LLIC-STF、LLIC-ELIC、LLIC-TCM。大量实验表明,我们提出的LLIC模型相比对应基线有显著提升,达到了最先进的性能,并在性能与复杂度之间取得了更优权衡。