Autoencoder-based structures have dominated recent learned image compression methods. However, the inherent information loss associated with autoencoders limits their rate-distortion performance at high bit rates and restricts their flexibility of rate adaptation. In this paper, we present a variable-rate image compression model based on invertible transform to overcome these limitations. Specifically, we design a lightweight multi-scale invertible neural network, which bijectively maps the input image into multi-scale latent representations. To improve the compression efficiency, a multi-scale spatial-channel context model with extended gain units is devised to estimate the entropy of the latent representation from high to low levels. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods, and remains competitive with recent multi-model approaches. Notably, our method is the first learned image compression solution that outperforms VVC across a very wide range of bit rates using a single model, especially at high bit rates.The source code is available at \href{https://github.com/hytu99/MSINN-VRLIC}{https://github.com/hytu99/MSINN-VRLIC}.
翻译:基于自编码器的结构在近期的学习图像压缩方法中占据主导地位。然而,自编码器固有的信息损失限制了其在高速率下的率失真性能,并制约了其速率适应的灵活性。本文提出了一种基于可逆变换的变速率图像压缩模型以克服这些局限。具体而言,我们设计了一个轻量级的多尺度可逆神经网络,该网络将输入图像双射地映射为多尺度的潜在表示。为了提高压缩效率,我们设计了一种带有扩展增益单元的多尺度空间-通道上下文模型,用于从高到低层级估计潜在表示的熵。实验结果表明,与现有的变速率方法相比,所提方法取得了最先进的性能,并与近期的多模型方法保持竞争力。值得注意的是,我们的方法是首个使用单一模型在极宽比特率范围(尤其是在高比特率下)性能超越VVC的学习图像压缩方案。源代码可在 \href{https://github.com/hytu99/MSINN-VRLIC}{https://github.com/hytu99/MSINN-VRLIC} 获取。