Efficient Contextformer: Spatio-Channel Window Attention for Fast Context Modeling in Learned Image Compression

Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a significant computational effort. In this work, we introduce the Efficient Contextformer (eContextformer) - a computationally efficient transformer-based autoregressive context model for learned image compression. The eContextformer efficiently fuses the patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling, and introduces a shifted window spatio-channel attention mechanism. We explore better training strategies and architectural designs and introduce additional complexity optimizations. During decoding, the proposed optimization techniques dynamically scale the attention span and cache the previous attention computations, drastically reducing the model and runtime complexity. Compared to the non-parallel approach, our proposal has ~145x lower model complexity and ~210x faster decoding speed, and achieves higher average bit savings on Kodak, CLIC2020, and Tecnick datasets. Additionally, the low complexity of our context model enables online rate-distortion algorithms, which further improve the compression performance. We achieve up to 17% bitrate savings over the intra coding of Versatile Video Coding (VVC) Test Model (VTM) 16.2 and surpass various learning-based compression models.

翻译：熵估计对于学习型图像压缩的性能至关重要。研究表明，基于变换器的熵模型对实现高压缩比具有关键作用，但代价是显著的计算开销。本文提出高效上下文变换器（eContextformer）——一种计算高效的基于变换器的自回归上下文模型，用于学习型图像压缩。该模型有效融合了分块、网格和通道分组技术以实现并行上下文建模，并引入了一种移位窗口空通道注意力机制。我们探索了更优的训练策略与架构设计，并增加了额外的复杂度优化。在解码过程中，所提出的优化技术可动态调整注意力跨度并缓存先前的注意力计算，大幅降低了模型与运行时复杂度。与非并行方法相比，本方案模型复杂度降低约145倍，解码速度提升约210倍，并在Kodak、CLIC2020和Tecnick数据集上实现了更高的平均比特节省。此外，上下文模型的低复杂度使得在线率失真算法成为可能，进一步提升了压缩性能。相较于多功能视频编码（VVC）测试模型（VTM）16.2的帧内编码，我们实现了高达17%的比特率节省，并超越了多种基于学习的压缩模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日