Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific to ViTs, aiming to optimize their performance by tailoring both the algorithmic structure and the underlying hardware accelerator to each other's strengths. Model quantization, by converting high-precision numbers to lower-precision, reduces the computational demands and memory needs of ViTs, allowing the creation of hardware specifically optimized for these quantized algorithms, boosting efficiency. This article provides a comprehensive survey of ViTs quantization and its hardware acceleration. We first delve into the unique architectural attributes of ViTs and their runtime characteristics. Subsequently, we examine the fundamental principles of model quantization, followed by a comparative analysis of the state-of-the-art quantization techniques for ViTs. Additionally, we explore the hardware acceleration of quantized ViTs, highlighting the importance of hardware-friendly algorithm design. In conclusion, this article will discuss ongoing challenges and future research paths. We consistently maintain the related open-source materials at https://github.com/DD-DuDa/awesome-vit-quantization-acceleration.
翻译:视觉Transformer(ViTs)近期引起了广泛关注,成为卷积神经网络(CNNs)在多项视觉应用中的有力替代方案。然而,其庞大的模型规模及高昂的计算与内存需求阻碍了在资源受限设备上的部署。这凸显了针对ViTs进行算法-硬件协同设计的必要性,旨在通过相互适配算法结构与底层硬件加速器以优化性能。模型量化通过将高精度数值转换为低精度表示,降低了ViTs的计算与内存需求,使得能够创建专为量化算法优化的硬件,从而提升效率。本文全面综述了ViTs的量化技术及其硬件加速方法。我们首先深入剖析ViTs的独特架构特性及其运行时特征;随后,阐释模型量化的基本原理,并对当前最先进的ViTs量化方法进行对比分析。此外,我们探讨了量化ViTs的硬件加速方案,强调了硬件友好型算法设计的重要性。最后,本文讨论了现有挑战与未来研究方向。相关开源资料持续维护于https://github.com/DD-DuDa/awesome-vit-quantization-acceleration。