Best of Both Worlds: Hybrid SNN-ANN Architecture for Event-based Optical Flow Estimation

Event-based cameras offer a low-power alternative to frame-based cameras for capturing high-speed motion and high dynamic range scenes. They provide asynchronous streams of sparse events. Spiking Neural Networks (SNNs) with their asynchronous event-driven compute, show great potential for extracting the spatio-temporal features from these event streams. In contrast, the standard Analog Neural Networks (ANNs1) fail to process event data effectively. However, training SNNs is difficult due to additional trainable parameters (thresholds and leaks), vanishing spikes at deeper layers, non-differentiable binary activation function etc. Moreover, an additional data structure "membrane potential" responsible for keeping track of temporal information, must be fetched and updated at every timestep in SNNs. To overcome these, we propose a novel SNN-ANN hybrid architecture that combines the strengths of both. Specifically, we leverage the asynchronous compute capabilities of SNN layers to effectively extract the input temporal information. While the ANN layers offer trouble-free training and implementation on standard machine learning hardware such as GPUs. We provide extensive experimental analysis for assigning each layer to be spiking or analog in nature, leading to a network configuration optimized for performance and ease of training. We evaluate our hybrid architectures for optical flow estimation using event-data on DSEC-flow and Mutli-Vehicle Stereo Event-Camera (MVSEC) datasets. The results indicate that our configured hybrid architectures outperform the state-of-the-art ANN-only, SNN-only and past hybrid architectures both in terms of accuracy and efficiency. Specifically, our hybrid architecture exhibit a 31% and 24.8% lower average endpoint error (AEE) at 2.1x and 3.1x lower energy, compared to an SNN-only architecture on DSEC and MVSEC datasets, respectively.

翻译：事件相机为捕捉高速运动和高动态范围场景提供了一种低功耗替代传统帧相机的方案。它们提供异步稀疏事件流。脉冲神经网络（SNN）凭借其异步事件驱动计算特性，在从这些事件流中提取时空特征方面展现出巨大潜力。相比之下，标准模拟神经网络（ANN）无法有效处理事件数据。然而，由于存在额外可训练参数（阈值和泄漏）、深层脉冲消失、不可微二元激活函数等问题，SNN的训练较为困难。此外，SNN中负责追踪时间信息的额外数据结构"膜电位"必须在每个时间步长进行读取和更新。为克服这些困难，我们提出一种融合两者优势的新型SNN-ANN混合架构。具体而言，我们利用SNN层的异步计算能力有效提取输入时间信息，而ANN层则可在GPU等标准机器学习硬件上轻松训练与部署。我们通过大量实验分析确定每层应设置为脉冲型还是模拟型，从而得到兼顾性能与训练难度的优化网络配置。我们在DSEC-flow和多车立体事件相机（MVSEC）数据集上，使用事件数据对混合架构进行光流估计评估。结果表明，我们配置的混合架构在精度和效率上均优于当前最先进的纯ANN、纯SNN及过往混合架构。具体而言，与纯SNN架构相比，在DSEC和MVSEC数据集上，我们的混合架构在能耗分别降低2.1倍和3.1倍的同时，平均端点误差（AEE）分别降低了31%和24.8%。