Despite the dynamic development of computer vision algorithms, the implementation of perception and control systems for autonomous vehicles such as drones and self-driving cars still poses many challenges. A video stream captured by traditional cameras is often prone to problems such as motion blur or degraded image quality due to challenging lighting conditions. In addition, the frame rate - typically 30 or 60 frames per second - can be a limiting factor in certain scenarios. Event cameras (DVS -- Dynamic Vision Sensor) are a potentially interesting technology to address the above mentioned problems. In this paper, we compare two methods of processing event data by means of deep learning for the task of pedestrian detection. We used a representation in the form of video frames, convolutional neural networks and asynchronous sparse convolutional neural networks. The results obtained illustrate the potential of event cameras and allow the evaluation of the accuracy and efficiency of the methods used for high-resolution (1280 x 720 pixels) footage.
翻译:尽管计算机视觉算法不断发展,但为自动驾驶车辆(如无人机和自动驾驶汽车)实现感知与控制系统仍面临诸多挑战。传统摄像头捕获的视频流常因运动模糊或光照条件不佳导致图像质量下降。此外,帧率(通常为每秒30或60帧)在某些场景中可能成为限制因素。事件相机(DVS——动态视觉传感器)是应对上述问题的潜在技术。本文比较了两种基于深度学习处理事件数据以完成行人检测任务的方法。我们采用了视频帧表示、卷积神经网络与非同步稀疏卷积神经网络进行实验。所得结果展示了事件相机的潜力,并可用于评估所提方法在高分辨率(1280×720像素)影像上的准确性与效率。