You Can Mask More For Extremely Low-Bitrate Image Compression

Learned image compression (LIC) methods have experienced significant progress during recent years. However, these methods are primarily dedicated to optimizing the rate-distortion (R-D) performance at medium and high bitrates (> 0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited. Besides, existing methods fail to explicitly explore the image structure and texture components crucial for image compression, treating them equally alongside uninformative components in networks. This can cause severe perceptual quality degradation, especially under low-bitrate scenarios. In this work, inspired by the success of pre-trained masked autoencoders (MAE) in many downstream tasks, we propose to rethink its mask sampling strategy from structure and texture perspectives for high redundancy reduction and discriminative feature representation, further unleashing the potential of LIC methods. Therefore, we present a dual-adaptive masking approach (DA-Mask) that samples visible patches based on the structure and texture distributions of original images. We combine DA-Mask and pre-trained MAE in masked image modeling (MIM) as an initial compressor that abstracts informative semantic context and texture representations. Such a pipeline can well cooperate with LIC networks to achieve further secondary compression while preserving promising reconstruction quality. Consequently, we propose a simple yet effective masked compression model (MCM), the first framework that unifies MIM and LIC end-to-end for extremely low-bitrate image compression. Extensive experiments have demonstrated that our approach outperforms recent state-of-the-art methods in R-D performance, visual quality, and downstream applications, at very low bitrates. Our code is available at https://github.com/lianqi1008/MCM.git.

翻译：学习型图像压缩方法近年来取得了显著进展，但这些方法主要致力于优化中高码率（>0.1比特每像素）下的率失真性能，针对极低码率的研究十分有限。此外，现有方法未能明确挖掘对图像压缩至关重要的结构和纹理分量，而是将其与非信息分量在网络中同等对待，这会导致严重的感知质量退化，尤其在低码率场景下。受预训练掩码自编码器在多下游任务中成功应用的启发，本文从结构与纹理维度重新审视其掩码采样策略，以实现高冗余缩减与判别性特征表示，进一步释放学习型图像压缩方法的潜能。为此，我们提出双自适应掩码方法，该方法基于原始图像的结构和纹理分布采样可见块。在掩码图像建模中，我们将双自适应掩码与预训练掩码自编码器结合作为初始压缩器，提取富含信息的语义上下文和纹理表示。该流程能与学习型图像压缩网络良好协同，在保持优良重建质量的同时实现二次压缩。最终，我们提出一种简洁有效的掩码压缩模型，这是首个将掩码图像建模与学习型图像压缩端到端统一的极低码率图像压缩框架。大量实验表明，在极低码率下，本方法在率失真性能、视觉质量及下游应用方面均优于现有最新方法。我们的代码已开源至 https://github.com/lianqi1008/MCM.git。