Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we propose a novel transformer-based entropy model called GroupedMixer, which enjoys both faster coding speed and better compression performance than previous transformer-based methods. Specifically, our approach builds upon group-wise autoregression by first partitioning the latent variables into groups along spatial-channel dimensions, and then entropy coding the groups with the proposed transformer-based entropy model. The global causal self-attention is decomposed into more efficient group-wise interactions, implemented using inner-group and cross-group token-mixers. The inner-group token-mixer incorporates contextual elements within a group while the cross-group token-mixer interacts with previously decoded groups. Alternate arrangement of two token-mixers enables global contextual reference. To further expedite the network inference, we introduce context cache optimization to GroupedMixer, which caches attention activation values in cross-group token-mixers and avoids complex and duplicated computation. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.
翻译:近年来,基于Transformer的熵模型因其在概率分布估计中相比卷积方法具有更优越的长距离依赖捕捉能力而备受关注。然而,以往的基于Transformer的熵模型由于像素级自回归或推理过程中的重复计算,导致编码过程缓慢。本文提出了一种名为GroupedMixer的新型基于Transformer的熵模型,它在编码速度和压缩性能上均优于以往基于Transformer的方法。具体而言,我们的方法基于分组自回归,首先将潜在变量沿空间-通道维度划分为若干组,然后使用所提出的基于Transformer的熵模型对每组进行熵编码。全局因果自注意力被分解为更高效的分组交互,通过组内和跨组标记混合器实现。组内标记混合器整合了组内的上下文元素,而跨组标记混合器则与先前解码的组进行交互。两种标记混合器的交替排列实现了全局上下文引用。为进一步加速网络推理,我们在GroupedMixer中引入了上下文缓存优化,该优化在跨组标记混合器中缓存注意力激活值,从而避免了复杂且重复的计算。实验结果表明,所提出的GroupedMixer在快速压缩速度下实现了最先进的率失真性能。