AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers

Spiking Vision Transformers (SViTs) have emerged as alternative low-power ViT models, but their large sizes hinder their deployments on resource-constrained embedded AI systems. To address this, state-of-the-art works proposed quantization techniques to compress SViT models, but their manual, human-guided approach needs a huge design time and power/energy consumption to find the appropriate quantization setting for each given network, making this approach not scalable for quantizing multiple networks. Toward this, we propose AQ4SViT, a novel automated quantization framework for SViTs that can provide quick quantization settings with good trade-offs between accuracy and memory. To achieve this, AQ4SViT employs the following key ideas: quantization search strategy that evaluates the quantization setting candidates while considering the accuracy constraint; and search gating policy that quickly evaluates and selects promising quantization candidates by leveraging membrane potential drift as a performance proxy. In the search gating policy, AQSViT employs two search algorithm variants to provide trade-off options: Greedy search, which performs fast but may lead to local optima; and Beam search, which performs slower but has better performance in finding global optima selection due to a wider search space. Experimental results show that AQ4SViT-Greedy quickly finds the appropriate quantization settings, achieving up to 6.6x faster search time and up to 82.5% memory saving compared to the state-of-the-art; while AQ4SViT-Beam further reduces the memory footprint by up to 90% compared to the state-of-the-art, but with 4.5x longer search time; all these results are obtained while maintaining high accuracy within 1.5% from the original/non-quantized models on the ImageNet dataset. These results highlight that AQ4SViT framework offers advancements toward SViT deployments on embedded AI systems.

翻译：脉冲视觉Transformer（SViT）已成为低功耗视觉Transformer的替代方案，但其模型规模庞大，限制了其在资源受限的嵌入式AI系统上的部署。为此，现有研究提出量化技术以压缩SViT模型，但其基于人工干预的手动方法需要耗费大量设计时间与功耗/能量开销来找到每个给定网络的最优量化配置，导致该方法难以扩展到多网络的量化任务。针对这一问题，我们提出AQ4SViT——一种面向SViT的新型自动化量化框架，可快速提供兼顾精度与存储开销的量化配置。该框架的核心创新包括：量化搜索策略，在评估量化配置候选方案时考虑精度约束；以及搜索门控策略，通过利用膜电位漂移作为性能代理指标，快速评估并筛选具有潜力的量化候选方案。在搜索门控策略中，AQ4SViT引入两种搜索算法变体以提供权衡方案：贪婪搜索（Greedy Search），搜索速度快但可能陷入局部最优；束搜索（Beam Search），搜索速度较慢但因搜索空间更广而更易找到全局最优解。实验结果表明，AQ4SViT-Greedy能快速找到最优量化配置，相比现有技术搜索速度提升6.6倍，存储节省达82.5%；而AQ4SViT-Beam在存储压缩比上进一步提升至90%（相较现有技术），但搜索时间延长4.5倍。所有结果均在ImageNet数据集上实现，其高精度保持与原始/非量化模型误差不超过1.5%。这些成果表明，AQ4SViT框架为SViT在嵌入式AI系统上的部署提供了重要进展。