Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. Specifically, we design a full-spike residual block, EMS-ResNet, which can effectively extend the depth of the directly-trained SNN with low power consumption. Furthermore, we theoretically analyze and prove the EMS-ResNet could avoid gradient vanishing or exploding. The results demonstrate that our approach outperforms the state-of-the-art ANN-SNN conversion methods (at least 500 time steps) in extremely fewer time steps (only 4 time steps). It is shown that our model could achieve comparable performance to the ANN with the same architecture while consuming 5.83 times less energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.
翻译:尖峰神经网络(SNN)是一种受大脑启发的节能模型,通过时空动态编码信息。近年来,直接训练的深度SNN在分类任务中仅需极少时间步即可实现高精度。然而,如何为回归任务——目标检测设计直接训练的SNN仍是一个挑战性问题。针对此问题,我们提出EMS-YOLO,一种新型直接训练SNN框架用于目标检测,这是首次尝试使用替代梯度训练深度SNN而非采用ANN-SNN转换策略。具体而言,我们设计了全尖峰残差模块EMS-ResNet,可有效扩展直接训练SNN的深度并保持低功耗。此外,我们从理论上分析并证明EMS-ResNet能避免梯度消失或爆炸。结果表明,我们的方法在极少的4个时间步内即可超越现有最先进的ANN-SNN转换方法(至少500个时间步)。与相同架构的ANN相比,我们的模型在帧级COCO数据集和事件级Gen1数据集上性能相当,同时能耗降低5.83倍。