Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel, which results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs ($\sim$98% weight sparsity) can suffer as low as $\sim$59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.
翻译:脉冲神经网络(SNN)的剪枝已成为在资源受限边缘设备上部署深度SNN的基础性方法。现有剪枝方法虽能为深度SNN提供极高权重稀疏度,但高权重稀疏度会引发工作负载不均衡问题。具体而言,当并行运行的硬件单元被分配不同数量的非零权重时,将导致工作负载不均衡,进而造成硬件利用率低下,并带来更长的时延与更高的能耗成本。初步实验表明,稀疏SNN(约98%权重稀疏度)的硬件利用率可能低至约59%。为解决此问题,我们提出u-Ticket方法:在基于彩票假说(LTH)的剪枝过程中,实时监测并调整SNN的权连接接,确保最终获得的彩票(ticket)在部署至硬件时获得最优利用率。实验证明,与未考虑硬件利用率的LTH方法相比,u-Ticket可保证高达100%的硬件利用率,从而将时延降低76.9%,能耗成本降低63.8%。