Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in challenging visual conditions, event cameras have the potential to overcome the limitations of frame-based cameras in the computer vision and robotics community. In very recent years, deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential. However, the technical advances still remain unknown, thus making it urgent and necessary to conduct a systematic overview. To this end, we conduct the first yet comprehensive and in-depth survey, with a focus on the latest developments of DL techniques for event-based vision. We first scrutinize the typical event representations with quality enhancement methods as they play a pivotal role as inputs to the DL models. We then provide a comprehensive taxonomy for existing DL-based methods by structurally grouping them into two major categories: 1) image reconstruction and restoration; 2) event-based scene understanding 3D vision. Importantly, we conduct benchmark experiments for the existing methods in some representative research directions (eg, object recognition and optical flow estimation) to identify some critical insights and problems. Finally, we make important discussions regarding the challenges and provide new perspectives for motivating future research studies.

翻译：事件相机是一种仿生传感器，能够异步捕捉每个像素的强度变化，并生成编码时间、像素位置和极性（变化方向）的事件流。相比传统帧式相机，事件相机具有高时间分辨率、高动态范围、低延迟等诸多优势。凭借在复杂视觉条件下捕捉信息的能力，事件相机有望克服计算机视觉与机器人领域中帧式相机的局限性。近年来，深度学习技术被引入这一新兴领域，激发了挖掘其潜力的积极研究。然而，相关技术进展仍缺乏系统认知，因此开展系统性综述显得迫切且必要。为此，我们首次进行了全面深入的评述，聚焦基于深度学习的事件视觉技术最新发展。我们首先审视了作为深度学习模型输入的关键要素——典型事件表示及其质量增强方法；随后将现有基于深度学习方法划分为两大类别进行结构化分类：1）图像重建与恢复；2）基于事件的场景理解与三维视觉。重要的是，我们在代表性研究方向（如目标识别与光流估计）上对现有方法进行了基准实验，以揭示关键洞见与问题。最后，我们就挑战展开重要讨论，并为未来研究提供了新视角。