Event-based cameras feature high temporal resolution, wide dynamic range, and low power consumption, which is ideal for high-speed and low-light object detection. Spiking neural networks (SNNs) are promising for event-based object recognition and detection due to their spiking nature but lack efficient training methods, leading to gradient vanishing and high computational complexity, especially in deep SNNs. Additionally, existing SNN frameworks often fail to effectively handle multi-scale spatiotemporal features, leading to increased data redundancy and reduced accuracy. To address these issues, we propose CREST, a novel conjointly-trained spike-driven framework to exploit spatiotemporal dynamics in event-based object detection. We introduce the conjoint learning rule to accelerate SNN learning and alleviate gradient vanishing. It also supports dual operation modes for efficient and flexible implementation on different hardware types. Additionally, CREST features a fully spike-driven framework with a multi-scale spatiotemporal event integrator (MESTOR) and a spatiotemporal-IoU (ST-IoU) loss. Our approach achieves superior object recognition & detection performance and up to 100X energy efficiency compared with state-of-the-art SNN algorithms on three datasets, providing an efficient solution for event-based object detection algorithms suitable for SNN hardware implementation.
翻译:事件相机具有高时间分辨率、宽动态范围和低功耗的特点,非常适合高速和弱光条件下的目标检测。脉冲神经网络(SNNs)因其脉冲特性在事件驱动的目标识别与检测中前景广阔,但缺乏高效的训练方法,导致梯度消失和计算复杂度高的问题,尤其在深度SNN中更为突出。此外,现有SNN框架通常无法有效处理多尺度时空特征,导致数据冗余增加和精度下降。为解决这些问题,我们提出了CREST,一种新颖的联合训练脉冲驱动框架,以利用事件驱动目标检测中的时空动态。我们引入了联合学习规则以加速SNN学习并缓解梯度消失问题。该框架还支持双操作模式,可在不同类型的硬件上实现高效灵活部署。此外,CREST采用全脉冲驱动框架,包含多尺度时空事件积分器(MESTOR)和时空交并比(ST-IoU)损失函数。我们的方法在三个数据集上实现了卓越的目标识别与检测性能,与最先进的SNN算法相比,能效提升高达100倍,为适用于SNN硬件实现的事件驱动目标检测算法提供了高效解决方案。