Image Compression with Product Quantized Masked Image Modeling

Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed. In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression. We build upon the VQ-VAE framework and introduce several modifications. First, we replace the vanilla vector quantizer by a product quantizer. This intermediate solution between vector and scalar quantization allows for a much wider set of rate-distortion points: It implicitly defines high-quality quantizers that would otherwise require intractably large codebooks. Second, inspired by the success of Masked Image Modeling (MIM) in the context of self-supervised learning and generative image models, we propose a novel conditional entropy model which improves entropy coding by modelling the co-dependencies of the quantized latent codes. The resulting PQ-MIM model is surprisingly effective: its compression performance on par with recent hyperprior methods. It also outperforms HiFiC in terms of FID and KID metrics when optimized with perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with image generation frameworks, we show qualitatively that it can operate under a hybrid mode between compression and generation, with no further training or finetuning. As a result, we explore the extreme compression regime where an image is compressed into 200 bytes, i.e., less than a tweet.

翻译：近期神经压缩方法主要基于流行的超先验框架，该框架依赖标量量化并展现出极强的压缩性能。这与图像生成和表示学习领域的近期进展形成鲜明对比——后者更常采用向量量化。本文尝试通过重新审视图像压缩中的向量量化来拉近这两条研究路线。我们以VQ-VAE框架为基础并引入多项改进：首先，用乘积量化器替代原始向量量化器。这种介于向量与标量量化之间的中间方案能够实现更广泛的率失真点集：它隐式定义了高质量量化器，而若采用传统方法则需要不可行的大规模码本。其次，受掩码图像建模（MIM）在自监督学习与生成式图像模型中的成功启发，我们提出新型条件熵模型，通过建模量化潜码的相互依赖关系来改进熵编码。由此产生的PQ-MIM模型表现出惊人的有效性：其压缩性能与近期的超先验方法相当。当采用感知损失（如对抗损失）优化时，它在FID和KID指标上超越HiFiC。最后，由于PQ-MIM与图像生成框架兼容，我们定性展示其无需额外训练或微调即可在压缩与生成的混合模式下运行。基于此，我们探索了极端压缩场景——将图像压缩至200字节（不足一条推文的容量）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日