GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we propose a novel transformer-based entropy model called GroupedMixer, which enjoys both faster coding speed and better compression performance than previous transformer-based methods. Specifically, our approach builds upon group-wise autoregression by first partitioning the latent variables into groups along spatial-channel dimensions, and then entropy coding the groups with the proposed transformer-based entropy model. The global causal self-attention is decomposed into more efficient group-wise interactions, implemented using inner-group and cross-group token-mixers. The inner-group token-mixer incorporates contextual elements within a group while the cross-group token-mixer interacts with previously decoded groups. Alternate arrangement of two token-mixers enables global contextual reference. To further expedite the network inference, we introduce context cache optimization to GroupedMixer, which caches attention activation values in cross-group token-mixers and avoids complex and duplicated computation. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.

翻译：近年来，基于Transformer的熵模型因其在概率分布估计中相比卷积方法具有更优越的长距离依赖捕捉能力而备受关注。然而，以往的基于Transformer的熵模型由于像素级自回归或推理过程中的重复计算，导致编码过程缓慢。本文提出了一种名为GroupedMixer的新型基于Transformer的熵模型，它在编码速度和压缩性能上均优于以往基于Transformer的方法。具体而言，我们的方法基于分组自回归，首先将潜在变量沿空间-通道维度划分为若干组，然后使用所提出的基于Transformer的熵模型对每组进行熵编码。全局因果自注意力被分解为更高效的分组交互，通过组内和跨组标记混合器实现。组内标记混合器整合了组内的上下文元素，而跨组标记混合器则与先前解码的组进行交互。两种标记混合器的交替排列实现了全局上下文引用。为进一步加速网络推理，我们在GroupedMixer中引入了上下文缓存优化，该优化在跨组标记混合器中缓存注意力激活值，从而避免了复杂且重复的计算。实验结果表明，所提出的GroupedMixer在快速压缩速度下实现了最先进的率失真性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日