The event streams generated by dynamic vision sensors (DVS) are sparse and non-uniform in the spatial domain, while still dense and redundant in the temporal domain. Although spiking neural network (SNN), the event-driven neuromorphic model, has the potential to extract spatio-temporal features from the event streams, it is not effective and efficient. Based on the above, we propose an events sparsification spiking framework dubbed as Razor SNN, pruning pointless event frames progressively. Concretely, we extend the dynamic mechanism based on the global temporal embeddings, reconstruct the features, and emphasize the events effect adaptively at the training stage. During the inference stage, eliminate fruitless frames hierarchically according to a binary mask generated by the trained temporal embeddings. Comprehensive experiments demonstrate that our Razor SNN achieves competitive performance consistently on four events-based benchmarks: DVS 128 Gesture, N-Caltech 101, CIFAR10-DVS and SHD.
翻译:动态视觉传感器(DVS)生成的事件流在空间域上稀疏且非均匀,但在时间域上仍然密集且冗余。尽管脉冲神经网络(SNN)作为事件驱动的类脑模型具有从事件流中提取时空特征的潜力,但其效率与效果仍有不足。基于此,我们提出一种名为Razor SNN的事件稀疏化脉冲框架,通过渐进式修剪冗余事件帧。具体而言,我们在训练阶段基于全局时序嵌入扩展动态机制,重构特征并自适应增强事件效应;在推理阶段根据训练后的时序嵌入生成二值掩码,分层消除无效帧。综合实验表明,Razor SNN在四个基于事件的基准数据集(DVS 128 Gesture、N-Caltech 101、CIFAR10-DVS和SHD)上始终取得具有竞争力的性能。