Recent advancements in learned image compression (LIC) methods have demonstrated superior performance over traditional hand-crafted codecs. These learning-based methods often employ convolutional neural networks (CNNs) or Transformer-based architectures. However, these nonlinear approaches frequently overlook the frequency characteristics of images, which limits their compression efficiency. To address this issue, we propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies at the attention layer, and a Channel-aware Self-Attention (CaSA) module captures information across channels, significantly improving compression performance. Additionally, we introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information, which is crucial for effective compression. These innovations collectively improve the transformation's ability to project data into a more decorrelated latent space, thereby boosting overall compression efficiency. Experimental results demonstrate that our framework surpasses state-of-the-art LIC methods in rate-distortion performance.
翻译:近年来,学习型图像压缩方法取得了显著进展,其性能已超越传统手工编解码器。这些基于学习的方法通常采用卷积神经网络或基于Transformer的架构。然而,这些非线性方法往往忽视图像的频率特性,从而限制了其压缩效率。为解决这一问题,我们提出了一种新颖的基于Transformer的图像压缩方法,该方法通过考虑特征图中的频率分量来增强变换阶段。我们的方法集成了新型混合空间-通道注意力Transformer模块,其中基于空间的分支在注意力层独立处理高频与低频分量,而通道感知自注意力模块则跨通道捕获信息,显著提升了压缩性能。此外,我们在Transformer模块中引入了混合局部-全局前馈网络,以增强对多样丰富信息的提取能力,这对实现高效压缩至关重要。这些创新共同提升了变换过程将数据投影至更去相关潜在空间的能力,从而提高了整体压缩效率。实验结果表明,我们的框架在率失真性能上超越了当前最先进的学习型图像压缩方法。