Semantic segmentation is an important computer vision task, particularly for scene understanding and navigation of autonomous vehicles and UAVs. Several variations of deep neural network architectures have been designed to tackle this task. However, due to their huge computational costs and their high memory consumption, these models are not meant to be deployed on resource-constrained systems. To address this limitation, we introduce an end-to-end biologically inspired semantic segmentation approach by combining Spiking Neural Networks (SNNs, a low-power alternative to classical neural networks) with event cameras whose output data can directly feed these neural network inputs. We have designed EvSegSNN, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance. The experiments conducted on DDD17 demonstrate that EvSegSNN outperforms the closest state-of-the-art model in terms of MIoU while reducing the number of parameters by a factor of $1.6$ and sparing a batch normalization stage.
翻译:语义分割是一项重要的计算机视觉任务,尤其对于自动驾驶车辆和无人机的场景理解与导航至关重要。为解决该任务,已有多种深度神经网络架构变体被设计出来。然而,由于这些模型计算开销巨大且内存消耗高,它们并不适合部署在资源受限的系统上。为应对这一局限,我们提出了一种端到端的仿生语义分割方法,该方法将脉冲神经网络(SNNs,一种经典神经网络的低功耗替代方案)与事件相机相结合,事件相机的输出数据可直接作为这些神经网络的输入。我们设计了EvSegSNN,这是一种生物合理的编码器-解码器U形架构,其基于参数化泄漏积分发放神经元,旨在权衡资源使用与性能。在DDD17数据集上进行的实验表明,EvSegSNN在MIoU指标上超越了最接近的先进模型,同时将参数量减少了$1.6$倍,并省去了批归一化处理阶段。