The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in the learned image compression community, we introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter pointwise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by 9.49%, 9.47%, 10.94% on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
翻译:有效感受野(ERF)在变换编码中起着重要作用,它决定了变换过程中能去除多少冗余,以及在逆变换过程中能利用多少空间先验来合成纹理。现有方法依赖于小卷积核的堆叠,其ERF仍不够大,或依赖于沉重的非局部注意力机制,这限制了高分辨率图像编码的潜力。为解决此问题,我们提出了具有自适应权重的大感受野变换编码用于学习型图像压缩(LLIC)。具体而言,我们在学习型图像压缩领域首次引入了一些基于大卷积核的深度可分离卷积,以在保持适度复杂度的同时减少更多冗余。由于图像多样性范围广泛,我们进一步提出了一种通过自条件权重生成来增强卷积适应性的机制。大卷积核与非线性嵌入和门控机制协同工作,以实现更好的表达能力和更轻量的逐点交互。我们的研究扩展到精细的训练方法,以充分释放这些大卷积核的潜力。此外,为促进更动态的通道间交互,我们引入了一种自适应通道级比特分配策略,该策略以自条件方式自主生成通道重要性因子。为证明所提变换编码的有效性,我们调整了熵模型以与现有变换方法进行比较,并获得了模型LLIC-STF、LLIC-ELIC和LLIC-TCM。大量实验表明,我们提出的LLIC模型相对于相应基线有显著改进,在Kodak数据集上相比VTM-17.0 Intra分别将BD-Rate降低了9.49%、9.47%和10.94%。我们的LLIC模型实现了最先进的性能,并在性能与复杂度之间取得了更好的权衡。