Event-based cameras are raising interest within the computer vision community. These sensors operate with asynchronous pixels, emitting events, or "spikes", when the luminance change at a given pixel since the last event surpasses a certain threshold. Thanks to their inherent qualities, such as their low power consumption, low latency and high dynamic range, they seem particularly tailored to applications with challenging temporal constraints and safety requirements. Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs), since the coupling of an asynchronous sensor with neuromorphic hardware can yield real-time systems with minimal power requirements. In this work, we seek to develop one such system, using both event sensor data from the DSEC dataset and spiking neural networks to estimate optical flow for driving scenarios. We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. To do so, we encourage both minimal norm for the error vector and minimal angle between ground-truth and predicted flow, training our model with back-propagation using a surrogate gradient. In addition, the use of 3d convolutions allows us to capture the dynamic nature of the data by increasing the temporal receptive fields. Upsampling after each decoding stage ensures that each decoder's output contributes to the final estimation. Thanks to separable convolutions, we have been able to develop a light model (when compared to competitors) that can nonetheless yield reasonably accurate optical flow estimates.
翻译:事件型相机正在引起计算机视觉领域的关注。这类传感器采用异步像素工作模式,当某像素点的亮度变化自上次事件后超过设定阈值时,会触发事件或"脉冲"信号。凭借低功耗、低延迟和高动态范围等固有特性,它们尤其适用于具有严苛时间约束和安全要求的应用场景。事件传感器与脉冲神经网络(SNNs)高度契合,因为异步传感器与神经形态硬件的结合能实现极低功耗的实时系统。本研究致力于开发此类系统,利用DSEC数据集中的事件传感器数据和脉冲神经网络进行驾驶场景光流估计。我们提出了类U-Net结构的SNN模型,经监督训练后能实现密集光流估计。为此,我们通过替代梯度法进行反向传播训练,同时优化误差向量的最小范数和真实光流与预测光流的最小夹角。同时,采用三维卷积可扩大时间感受野,从而有效捕捉数据的动态特性。每个解码阶段后的上采样操作确保各解码器输出均贡献于最终估计结果。借助可分离卷积,我们开发出了相较竞品更轻量化的模型,且仍能获得较高精度的光流估计。