In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.
翻译:本文提出了一种基于Transformer的可变码率图像压缩的渐进式学习范式。借助层自适应提示模块(LPM),该方法覆盖了广泛的压缩率范围。受视觉提示调谐启发,我们分别在编码端和解码端利用LPM从输入图像和隐藏特征中提取提示,将其作为额外信息馈入预训练Transformer图像压缩模型的Swin Transformer层,以影响注意力区域的分配和比特数,进而改变模型的目标压缩比。为实现网络轻量化,我们采用卷积层数较少的提示网络进行集成。大量实验表明,与针对不同目标码率分别优化多个模型的方法相比,本方法在参数存储上节省80%、数据集使用上节省90%的同时达到了相同性能。此外,我们的模型在率失真性能上超越了当前所有可变码率图像方法,并接近从头训练的固定码率图像压缩方法的当前最优水平。