The large sizes of Spiking Vision Transformers (SViTs) still hinder their embedded implementation, highlighting the need for model compression. State-of-the-art works compress SViT models through unstructured pruning, which needs specialized hardware accelerators for their specific sparsity patterns to maximize efficiency gains. Moreover, their manual approach requires a huge design time to find an appropriate pruning setting for each network, thus making this approach not scalable. To address this limitation, we propose PrimeSVT, a novel framework that performs automated memory-aware structured pruning on pre-trained SViT models, thereby maximizing their efficiency gains during inference amenable to widely-used computing architectures. To achieve this, PrimeSVT first sorts the SViT layers based on their sizes (i.e., number of parameters), identifies the targeted pruning layers based on their robustness under different pruning rates, then leverages this order for compressing the model layer-by-layer sequentially from the largest one to the smallest one (i.e., so-called prioritized compression policy), while considering the user-defined constraints (i.e., acceptable accuracy and memory saving). In each layer, PrimeSVT employs channel-wise filter pruning based on their L2-norm values to structurally remove the non-significant weights. Experimental results show that PrimeSVT saves 26.68% memory through automated single-shot pruning, while preserving accuracy within 3% (70.3% without fine-tuning and 72.9% with fine-tuning) from the original unpruned SViT model (73.3%), thus meeting the accuracy and memory constraints. These show that our PrimeSVT framework enables design automation for SViTs and their embedded implementation.
翻译:脉冲视觉Transformer(SViTs)的庞大体积仍阻碍其在嵌入式设备中的实现,凸显了模型压缩的必要性。现有先进技术通过非结构化剪枝压缩SViT模型,该方式需要专用硬件加速器来利用其特定的稀疏模式以最大化效率提升。此外,人工方法需要大量设计时间来为每个网络寻找合适的剪枝配置,导致该方法缺乏可扩展性。为解决这一局限,我们提出PrimeSVT——一种新颖框架,可在预训练SViT模型上执行自动内存感知的结构化剪枝,从而在推理过程中最大化效率提升,并兼容广泛使用的计算架构。为实现此目标,PrimeSVT首先按各层大小(即参数量)对SViT层进行排序,根据其在不同剪枝率下的鲁棒性识别目标剪枝层,然后利用该顺序从最大层到最小层逐层压缩模型(即所谓的优先压缩策略),同时考虑用户定义的约束(即可接受的精度和内存节省)。在每一层中,PrimeSVT基于L2范数值采用通道级滤波器剪枝,以结构化方式移除不显著的权重。实验结果表明,PrimeSVT通过自动单次剪枝节省了26.68%的内存,同时将精度保持在原始未剪枝SViT模型(73.3%)的3%以内(无微调时为70.3%,有微调时为72.9%),从而满足精度与内存约束。这表明,PrimeSVT框架为SViT及其嵌入式实现提供了设计自动化能力。