This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed images with varying visual characteristics. Our method provides the user with the flexibility to choose a trade-off between two image quality objectives using a single, shared model. Motivated by the success of prompt-tuning techniques, we introduce prompt tokens to condition our Transformer-based autoencoder. These prompt tokens are generated adaptively based on the user's preference and input image through learning a prompt generation network. Extensive experiments on commonly used quality metrics demonstrate the effectiveness of our method in adapting the encoding and/or decoding processes to a variable quality objective. While offering the additional flexibility, our proposed method performs comparably to the single-objective methods in terms of rate-distortion performance.
翻译:本文提出一种基于Transformer的图像压缩系统,可根据用户偏好实现可变图像质量目标。针对不同质量目标优化学习型编解码器,可生成具有不同视觉特征的 reconstructed图像。本方法允许用户通过单一共享模型灵活权衡两个图像质量目标。受提示调优技术成功应用的启发,我们引入提示令牌来调控基于Transformer的自编码器。这些提示令牌通过学习提示生成网络,根据用户偏好和输入图像自适应生成。在常用质量指标上的大量实验表明,本方法能有效将编码/解码过程适配至可变质量目标。在提供额外灵活性的同时,本方法在率失真性能上与单目标方法相当。