CMC-Bench: Towards a New Paradigm of Visual Signal Compression

Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.

翻译：超低码率图像压缩是一个具有挑战性且需求迫切的课题。随着大型多模态模型（LMMs）的发展，一种图像-文本-图像的跨模态压缩（CMC）范式应运而生。与传统编解码器相比，这种语义级压缩可将图像数据量缩减至0.1%甚至更低，具有强大的应用潜力。然而，CMC在图像还原一致性与感知质量方面仍存在一定缺陷。为解决该问题，我们提出了CMC-Bench——一个用于评估图像到文本（I2T）与文本到图像（T2I）模型在图像压缩任务中协同性能的基准测试。该基准分别涵盖18,000张与40,000张图像，用于验证6个主流I2T模型与12个T2I模型，并包含由专家标注的160,000个人类主观偏好评分。本文证实在超低码率下，部分I2T与T2I模型的组合性能已超越最先进的视觉信号编解码器；同时，研究也揭示了LMMs面向压缩任务可进一步优化的方向。我们鼓励LMM开发者参与此项测试，以推动视觉信号编解码协议的发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日