Recently learned image compression (LIC) has achieved great progress and even outperformed the traditional approach using DCT or discrete wavelet transform (DWT). However, LIC mainly reduces spatial redundancy in the autoencoder networks and entropy coding, but has not fully removed the frequency-domain correlation explicitly as in DCT or DWT. To leverage the best of both worlds, we propose a surprisingly simple but efficient framework, which introduces the DWT to both the convolution layers and entropy coding of CNN-based LIC. First, in both the core and hyperprior autoencoder networks, we propose a Wavelet-domain Convolution (WeConv) module, which performs convolution after DWT, and then converts the data back to spatial domain via inverse DWT. This module is used at selected layers in a CNN network to reduce the frequency-domain correlation explicitly and make the signal sparser in DWT domain. We also propose a wavelet-domain Channel-wise Auto-Regressive entropy Model (WeChARM), where the output latent representations from the encoder network are first transformed by the DWT, before applying quantization and entropy coding, as in the traditional paradigm. Moreover, the entropy coding is split into two steps. We first code all low-frequency DWT coefficients, and then use them as prior to code high-frequency coefficients. The channel-wise entropy coding is further used in each step. By combining WeConv and WeChARM, the proposed WeConvene scheme achieves superior R-D performance compared to other state-of-the-art LIC methods as well as the latest H.266/VVC. For the Kodak dataset and the baseline network with -0.4% BD-Rate saving over H.266/VVC, introducing WeConv with the simplest Haar transform improves the saving to -4.7%. This is quite impressive given the simplicity of the Haar transform. Enabling Haar-based WeChARM entropy coding further boosts the saving to -8.2%.
翻译:近年来,学习式图像压缩(LIC)取得了显著进展,其性能甚至超越了基于DCT或离散小波变换(DWT)的传统方法。然而,LIC主要通过自编码器网络和熵编码来降低空间冗余,未能像DCT或DWT那样显式地消除频域相关性。为融合两类方法的优势,我们提出了一种简洁而高效的框架,将DWT同时引入基于CNN的LIC的卷积层与熵编码模块。首先,在核心自编码器网络与超先验网络中,我们提出了小波域卷积(WeConv)模块,该模块在DWT变换后执行卷积运算,再通过逆DWT将数据转换回空域。该模块被嵌入CNN网络的特定层级,以显式降低频域相关性,并使信号在小波域中更稀疏。同时,我们提出了小波域通道自回归熵模型(WeChARM),该模型在量化与熵编码前,先将编码器输出的潜在表示通过DWT进行变换,延续了传统压缩范式。此外,熵编码被分为两个步骤:首先编码全部低频DWT系数,随后以其为先验信息编码高频系数。每一步均进一步采用通道级熵编码。通过结合WeConv与WeChARM,所提出的WeConvene方案在率失真性能上超越了其他先进LIC方法及最新的H.266/VVC标准。在Kodak数据集上,基线网络相较于H.266/VVC的BD-Rate节省率为-0.4%,引入基于最简单Haar变换的WeConv后,节省率提升至-4.7%,这对于Haar变换的简洁性而言已相当显著。进一步启用基于Haar的WeChARM熵编码后,节省率可进一步提升至-8.2%。