Vision Transformer (ViT)-based models have shown state-of-the-art performance (e.g., accuracy) in vision-based AI tasks. However, realizing their capability in resource-constrained embedded AI systems is challenging due to their inherent large memory footprints and complex computations, thereby incurring high power/energy consumption. Recently, Spiking Vision Transformer (SViT)-based models have emerged as alternate low-power ViT networks. However, their large memory footprints still hinder their applicability for resource-constrained embedded AI systems. Therefore, there is a need for a methodology to compress SViT models without degrading the accuracy significantly. To address this, we propose QSViT, a novel design methodology to compress the SViT models through a systematic quantization strategy across different network layers. To do this, our QSViT employs several key steps: (1) investigating the impact of different precision levels in different network layers, (2) identifying the appropriate base quantization settings for guiding bit precision reduction, (3) performing a guided quantization strategy based on the base settings to select the appropriate quantization setting, and (4) developing an efficient quantized network based on the selected quantization setting. The experimental results demonstrate that, our QSViT methodology achieves 22.75% memory saving and 21.33% power saving, while also maintaining high accuracy within 2.1% from that of the original non-quantized SViT model on the ImageNet dataset. These results highlight the potential of QSViT methodology to pave the way toward the efficient SViT deployments on resource-constrained embedded AI systems.
翻译:基于视觉Transformer(ViT)的模型在视觉AI任务中展现了最先进的性能(例如,准确性)。然而,由于其固有的庞大内存占用和复杂计算,导致高功耗/能耗,在资源受限的嵌入式AI系统中实现其能力面临挑战。最近,基于脉冲视觉Transformer(SViT)的模型已成为一种替代的低功耗ViT网络。然而,其庞大的内存占用仍然阻碍了它们在资源受限的嵌入式AI系统中的适用性。因此,需要一种能在不明显降低准确性的前提下压缩SViT模型的方法。为此,我们提出了QSViT,一种新颖的设计方法,通过跨不同网络层的系统化量化策略来压缩SViT模型。为此,我们的QSViT采用了几个关键步骤:(1)研究不同网络层中不同精度水平的影响,(2)确定用于指导比特精度降低的适当基础量化设置,(3)基于基础设置执行引导式量化策略以选择合适的量化设置,以及(4)基于选定的量化设置开发高效的量化网络。实验结果表明,我们的QSViT方法在ImageNet数据集上实现了22.75%的内存节省和21.33%的功耗节省,同时保持了高准确性,与原始未量化SViT模型的准确性差距在2.1%以内。这些结果突显了QSViT方法为在资源受限的嵌入式AI系统上高效部署SViT铺平道路的潜力。