Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel. This results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs (~98% weight sparsity) can suffer as low as ~59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.
翻译:脉冲神经网络(SNN)的剪枝已成为在资源受限的边缘设备上部署深度SNN的基本方法。尽管现有剪枝方法能为深度SNN提供极高的权重稀疏性,但高权重稀疏性会带来工作负载不均衡问题。具体而言,当并行运行的硬件单元被分配不同数量的非零权重时,就会产生工作负载不均衡,导致硬件利用率低下,进而造成更高的延迟和能耗成本。初步实验表明,稀疏SNN(约98%权重稀疏度)的利用率可能低至约59%。为缓解工作负载不均衡问题,我们提出了u-Ticket方法,在基于彩票假设(LTH)的剪枝过程中监测并调整SNN的权重连接,从而确保最终生成的"彩票"在部署到硬件时能获得最优利用率。实验表明,u-Ticket可保证硬件利用率高达100%,与非利用率感知的LTH方法相比,延迟最高降低76.9%,能耗成本最高降低63.8%。