The Effective Receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed at most during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERF remains not large enough instead, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in the learned image compression community, we introduce a few large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, LLIC-TCM. Extensive experiments demonstrate our proposed LLIC models have significant improvements over corresponding baselines and reduce BD-Rate by 9.49%, 9.47%, 10.94% on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
翻译:有效感受野(ERF)在变换编码中起着重要作用,它决定了变换过程中最多能去除多少冗余信息,以及逆变换中能利用多少空间先验来合成纹理。现有方法依赖于小核堆叠,但其ERF仍不够大,或采用计算量巨大的非局部注意力机制,这限制了高分辨率图像编码的潜力。为解决此问题,我们提出基于自适应权重的大感受野变换编码用于学习图像压缩(LLIC)。具体来说,我们在学习图像压缩领域首次引入少量基于大核的深度可分离卷积,在保持适中复杂度的同时进一步减少冗余。为应对图像多样性范围广泛的问题,我们进一步提出通过自条件权重生成机制增强卷积的自适应性。大核与非线性嵌入及门控机制协同工作,以实现更强的表达能力和更轻量的逐点交互。我们的研究扩展到精细化训练方法,以充分释放大核的潜力。此外,为促进更动态的通道间交互,我们引入一种自适应通道级比特分配策略,该策略以自条件方式自主生成通道重要性因子。为证明所提变换编码的有效性,我们统一熵模型并与现有变换方法进行对比,获得LLIC-STF、LLIC-ELIC、LLIC-TCM模型。大量实验表明,我们提出的LLIC模型相较于对应基线有显著提升,在Kodak数据集上相比VTM-17.0 Intra分别降低BD-Rate 9.49%、9.47%、10.94%。我们的LLIC模型实现了最先进的性能,并在性能与复杂度之间取得了更优的权衡。