Recently, transformers are trending as replacements for CNNs in vision tasks, including compression. This trend compels us to question the inherent limitations of CNNs compared to transformers and to explore if CNNs can be enhanced to achieve the same or even better performance than transformers. We want to design a pure CNN based model for compression as most devices are optimized for CNNs well. In our analysis, we find that the key strengths of transformers lie in their dynamic weights and large receptive fields. To enable CNNs with such properties, we propose a novel transform module with large receptive filed learning and self-conditioned adaptability for learned image compression, named SLIC. Specifically, we enlarge the receptive field of depth-wise convolution with suitable complexity and generate the weights according to given conditions. In addition, we also investigate the self-conditioned factor for channels. To prove the effectiveness of our proposed transform module, we equip it with existing entropy models ChARM, SCCTX, and SWAtten and we obtain models SLIC-ChARM, SLIC-SCCTX, and SLIC-SWAtten. Extensive experiments demonstrate our SLIC-ChARM, SLIC-SCCTX, and SLIC-SWAtten have significant improvements over corresponding baselines and achieve SOTA performances with suitable complexity on 5 test datasets (Kodak, Tecnick, CLIC 20, CLIC 21, JPEGAI). Code will be available at https://github.com/JiangWeibeta/SLIC.
翻译:[translated abstract in Chinese]
近期,Transformer在包括图像压缩在内的视觉任务中逐渐取代CNN成为主流趋势。这一趋势促使我们质疑CNN相较于Transformer的固有局限性,并探索能否通过增强CNN使其达到甚至超越Transformer的性能。鉴于大多数设备已针对CNN进行充分优化,我们旨在设计一种纯CNN基础的压缩模型。经分析发现,Transformer的关键优势在于其动态权重与大感受野特性。为使CNN具备此类特性,本文提出一种面向学习型图像压缩的新型变换模块——SLIC,该模块具备大感受野学习与自条件自适应能力。具体而言,我们以适当的复杂度扩展深度可分离卷积的感受野,并根据给定条件动态生成权重。此外,我们还研究了通道维度的自条件化因子。为验证所提变换模块的有效性,我们将其与现有熵模型ChARM、SCCTX及SWAtten相结合,分别构建了SLIC-ChARM、SLIC-SCCTX和SLIC-SWAtten模型。大量实验表明,SLIC-ChARM、SLIC-SCCTX及SLIC-SWAtten在Kodak、Tecnick、CLIC 20、CLIC 21、JPEGAI五个测试数据集上相较对应基线模型均有显著性能提升,并在保持适当复杂度的前提下取得了最优性能。代码将发布于https://github.com/JiangWeibeta/SLIC。