Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific to ViTs, aiming to optimize their performance by tailoring both the algorithmic structure and the underlying hardware accelerator to each other's strengths. Model quantization, by converting high-precision numbers to lower-precision, reduces the computational demands and memory needs of ViTs, allowing the creation of hardware specifically optimized for these quantized algorithms, boosting efficiency. This article provides a comprehensive survey of ViTs quantization and its hardware acceleration. We first delve into the unique architectural attributes of ViTs and their runtime characteristics. Subsequently, we examine the fundamental principles of model quantization, followed by a comparative analysis of the state-of-the-art quantization techniques for ViTs. Additionally, we explore the hardware acceleration of quantized ViTs, highlighting the importance of hardware-friendly algorithm design. In conclusion, this article will discuss ongoing challenges and future research paths. We consistently maintain the related open-source materials at https://github.com/DD-DuDa/awesome-vit-quantization-acceleration.

翻译：视觉Transformer（ViTs）近期引起了广泛关注，成为卷积神经网络（CNNs）在多项视觉应用中的有力替代方案。然而，其庞大的模型规模及高昂的计算与内存需求阻碍了在资源受限设备上的部署。这凸显了针对ViTs进行算法-硬件协同设计的必要性，旨在通过相互适配算法结构与底层硬件加速器以优化性能。模型量化通过将高精度数值转换为低精度表示，降低了ViTs的计算与内存需求，使得能够创建专为量化算法优化的硬件，从而提升效率。本文全面综述了ViTs的量化技术及其硬件加速方法。我们首先深入剖析ViTs的独特架构特性及其运行时特征；随后，阐释模型量化的基本原理，并对当前最先进的ViTs量化方法进行对比分析。此外，我们探讨了量化ViTs的硬件加速方案，强调了硬件友好型算法设计的重要性。最后，本文讨论了现有挑战与未来研究方向。相关开源资料持续维护于https://github.com/DD-DuDa/awesome-vit-quantization-acceleration。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日