Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. Specifically, we design a full-spike residual block, EMS-ResNet, which can effectively extend the depth of the directly-trained SNN with low power consumption. Furthermore, we theoretically analyze and prove the EMS-ResNet could avoid gradient vanishing or exploding. The results demonstrate that our approach outperforms the state-of-the-art ANN-SNN conversion methods (at least 500 time steps) in extremely fewer time steps (only 4 time steps). It is shown that our model could achieve comparable performance to the ANN with the same architecture while consuming 5.83 times less energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.
翻译:脉冲神经网络(SNN)是一种受大脑启发的节能模型,通过时空动态进行信息编码。近年来,直接训练的深层SNN在极短时间步内实现分类任务的高性能方面取得了显著成功。然而,如何为回归任务(目标检测)设计直接训练的SNN仍是一个挑战性问题。针对该问题,我们提出了EMS-YOLO——一种新颖的直接训练SNN框架用于目标检测,这是首次尝试使用替代梯度训练深层SNN实现目标检测,而非采用ANN-SNN转换策略。具体而言,我们设计了全脉冲残差模块EMS-ResNet,可在低功耗下有效扩展直接训练SNN的深度。此外,我们从理论上分析并证明了EMS-ResNet能够避免梯度消失或爆炸问题。结果表明,我们的方法在极少数时间步(仅4步)下性能优于最先进的ANN-SNN转换方法(至少500时间步)。实验显示,在基于帧的COCO数据集和基于事件的Gen1数据集上,我们的模型在取得与相同架构ANN相当性能的同时,能耗降低5.83倍。