CMamba: Learned Image Compression with State Space Models

Learned Image Compression (LIC) has explored various architectures, such as Convolutional Neural Networks (CNNs) and transformers, in modeling image content distributions in order to achieve compression effectiveness. However, achieving high rate-distortion performance while maintaining low computational complexity (\ie, parameters, FLOPs, and latency) remains challenging. In this paper, we propose a hybrid Convolution and State Space Models (SSMs) based image compression framework, termed \textit{CMamba}, to achieve superior rate-distortion performance with low computational complexity. Specifically, CMamba introduces two key components: a Content-Adaptive SSM (CA-SSM) module and a Context-Aware Entropy (CAE) module. First, we observed that SSMs excel in modeling overall content but tend to lose high-frequency details. In contrast, CNNs are proficient at capturing local details. Motivated by this, we propose the CA-SSM module that can dynamically fuse global content extracted by SSM blocks and local details captured by CNN blocks in both encoding and decoding stages. As a result, important image content is well preserved during compression. Second, our proposed CAE module is designed to reduce spatial and channel redundancies in latent representations after encoding. Specifically, our CAE leverages SSMs to parameterize the spatial content in latent representations. Benefiting from SSMs, CAE significantly improves spatial compression efficiency while reducing spatial content redundancies. Moreover, along the channel dimension, CAE reduces inter-channel redundancies of latent representations via an autoregressive manner, which can fully exploit prior knowledge from previous channels without sacrificing efficiency. Experimental results demonstrate that CMamba achieves superior rate-distortion performance.

翻译：学习型图像压缩（LIC）已探索了多种架构，如卷积神经网络（CNN）和Transformer，以建模图像内容分布从而实现压缩效能。然而，在保持低计算复杂度（即参数量、浮点运算量和延迟）的同时实现高率失真性能仍具挑战性。本文提出一种基于卷积与状态空间模型（SSM）混合的图像压缩框架，称为 \textit{CMamba}，旨在以低计算复杂度实现优异的率失真性能。具体而言，CMamba 引入了两个关键组件：内容自适应 SSM（CA-SSM）模块和上下文感知熵（CAE）模块。首先，我们观察到 SSM 擅长建模整体内容但易丢失高频细节，而 CNN 则精于捕捉局部细节。受此启发，我们提出 CA-SSM 模块，该模块能在编码和解码阶段动态融合 SSM 块提取的全局内容与 CNN 块捕获的局部细节，从而在压缩过程中有效保留重要图像内容。其次，我们提出的 CAE 模块旨在减少编码后潜在表示中的空间和通道冗余。具体而言，CAE 利用 SSM 对潜在表示中的空间内容进行参数化建模。受益于 SSM 的特性，CAE 在显著提升空间压缩效率的同时有效降低了空间内容冗余。此外，沿通道维度，CAE 通过自回归方式减少潜在表示的通道间冗余，从而在不牺牲效率的前提下充分利用先前通道的先验知识。实验结果表明，CMamba 实现了卓越的率失真性能。