Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets.
翻译:近年来,学习型图像压缩(LIC)因其在图像存储与传输中的有效性而受到广泛关注。然而,现有LIC方法在潜在表示中存在冗余,这是由于在捕获各向异性频率分量和保留方向细节方面存在局限性。为解决这些挑战,我们提出一种新颖的频率感知Transformer(FAT)模块,首次针对LIC实现了多尺度方向分析。FAT模块包含频域分解窗口注意力(FDWA)子模块,用于捕获自然图像的多尺度和方向频率分量。此外,我们引入频率调制前馈网络(FMFFN),以自适应地调节不同频率分量,从而提升率失真性能。进一步地,我们提出一种基于Transformer的通道维自回归(T-CA)模型,有效利用了通道依赖性。实验表明,与现有LIC方法相比,我们的方法实现了最先进的率失真性能,并在Kodak、Tecnick和CLIC数据集上分别以14.5%、15.1%和13.0%的BD-rate显著优于最新标准化编码器VTM-12.1。