Event-based cameras are raising interest within the computer vision community. These sensors operate with asynchronous pixels, emitting events, or "spikes", when the luminance change at a given pixel since the last event surpasses a certain threshold. Thanks to their inherent qualities, such as their low power consumption, low latency and high dynamic range, they seem particularly tailored to applications with challenging temporal constraints and safety requirements. Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs), since the coupling of an asynchronous sensor with neuromorphic hardware can yield real-time systems with minimal power requirements. In this work, we seek to develop one such system, using both event sensor data from the DSEC dataset and spiking neural networks to estimate optical flow for driving scenarios. We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. To do so, we encourage both minimal norm for the error vector and minimal angle between ground-truth and predicted flow, training our model with back-propagation using a surrogate gradient. In addition, the use of 3d convolutions allows us to capture the dynamic nature of the data by increasing the temporal receptive fields. Upsampling after each decoding stage ensures that each decoder's output contributes to the final estimation. Thanks to separable convolutions, we have been able to develop a light model (when compared to competitors) that can nonetheless yield reasonably accurate optical flow estimates.
翻译:事件相机正引起计算机视觉领域的关注。这些传感器采用异步像素工作,当某个像素的亮度变化自上次事件起超过一定阈值时,便会发出事件或“脉冲”。凭借其低功耗、低延迟和高动态范围等固有特性,它们似乎特别适合对时间约束和安全要求严苛的应用场景。事件传感器与脉冲神经网络(SNNs)的耦合堪称完美,因为异步传感器与神经形态硬件的结合能够实现功耗极低的实时系统。在本工作中,我们旨在开发这样一种系统,利用DSEC数据集的事件传感器数据和脉冲神经网络来估计驾驶场景中的光流。我们提出一个类似U-Net的脉冲神经网络,在监督训练后能够进行密集光流估计。为此,我们鼓励优化误差向量的最小范数以及真实光流与预测光流之间的最小夹角,并使用替代梯度通过反向传播训练模型。此外,三维卷积的使用通过增大时间感受野帮助我们捕捉数据的动态特性。每个解码阶段后的上采样确保每个解码器的输出都对最终估计有所贡献。得益于可分离卷积,我们得以开发出一个轻量模型(与同类方法相比),它仍能提供相当准确的光流估计。