In recent years, learned image compression methods have demonstrated superior rate-distortion performance compared to traditional image compression methods. Recent methods utilize convolutional neural networks (CNN), variational autoencoders (VAE), invertible neural networks (INN), and transformers. Despite their significant contributions, a main drawback of these models is their poor performance in capturing local redundancy. Therefore, to leverage global features along with local redundancy, we propose a CNN-based solution integrated with a feature encoding module. The feature encoding module encodes important features before feeding them to the CNN and then utilizes cross-scale window-based attention, which further captures local redundancy. Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field. Both the feature encoding module and the cross-scale window-based attention module in our architecture are flexible and can be incorporated into any other network architecture. We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.
翻译:近年来,学习型图像压缩方法相较于传统图像压缩方法已展现出更优的率失真性能。现有方法主要采用卷积神经网络(CNN)、变分自编码器(VAE)、可逆神经网络(INN)以及Transformer架构。尽管这些模型贡献显著,但其主要缺陷在于捕获局部冗余的能力不足。为此,为同时利用全局特征与局部冗余,我们提出一种融合特征编码模块的CNN解决方案。该特征编码模块在将特征输入CNN前对其进行编码,并采用基于跨尺度窗口的注意力机制,以进一步捕获局部冗余。跨尺度窗口注意力机制受Transformer中注意力机制的启发,能有效扩大感受野。本架构中的特征编码模块与跨尺度窗口注意力模块均具备灵活性,可集成至其他任意网络架构中。我们在Kodak与CLIC数据集上评估了所提方法,实验表明该方法具有显著效能,且性能与当前最优方法相当。