LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression

Effective Receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed at most during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERF remains not large enough instead, or heavy non-local attention mechanisms, which limit the potential of high resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in learned image compression community, we introduce a few large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to wide range of image diversity, we propose to enhance the adaptability of convolutions via generating weights in a self-conditioned manner. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. We also investigate improved training techniques to fully exploit the potential of large kernels. In addition, to enhance the interactions among channels, we propose the adaptive channel-wise bit allocation via generating channel importance factor in a self-conditioned manner. To demonstrate the effectiveness of proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, LLIC-TCM. Extensive experiments demonstrate our proposed LLIC models have significant improvements over corresponding baselines and achieve state-of-the-art performances and better trade-off between performance and complexity.

翻译：有效感受野在变换编码中起着关键作用，它决定了变换过程中最多能消除多少冗余，以及逆变换过程中能利用多少空间先验来合成纹理。现有方法依赖于小核堆叠（其有效感受野仍不够大）或沉重的非局部注意力机制（限制了高分辨率图像编码的潜力）。为解决此问题，我们提出面向学习型图像压缩的自适应权重大感受野变换编码（LLIC）。具体而言，在学习型图像压缩领域首次引入少量基于大核的深度可分离卷积，在保持适中复杂度的同时进一步减少冗余。针对图像多样性广泛的特性，我们提出通过自条件方式生成权重来增强卷积的自适应性。大核与非线性嵌入及门控机制协同作用，以实现更强的表达能力和更轻量的逐点交互。我们还研究了改进的训练技术以充分挖掘大核潜力。此外，为增强通道间交互，我们通过自条件方式生成通道重要性因子，实现自适应逐通道比特分配。为验证所提变换编码的有效性，我们统一熵模型与现有变换方法进行对比，得到LLIC-STF、LLIC-ELIC、LLIC-TCM模型。大量实验表明，我们提出的LLIC模型相比对应基线有显著提升，实现了最先进的性能及性能与复杂度间更优的权衡。