The human eye consists of two types of photoreceptors, rods and cones. Rods are responsible for monochrome vision, and cones for color vision. The number of rods is much higher than the cones, which means that most human vision processing is done in monochrome. An event camera reports the change in pixel intensity and is analogous to rods. Event and color cameras in computer vision are like rods and cones in human vision. Humans can notice objects moving in the peripheral vision (far right and left), but we cannot classify them (think of someone passing by on your far left or far right, this can trigger your attention without knowing who they are). Thus, rods act as a region proposal network (RPN) in human vision. Therefore, an event camera can act as a region proposal network in deep learning Two-stage object detectors in deep learning, such as Mask R-CNN, consist of a backbone for feature extraction and a RPN. Currently, RPN uses the brute force method by trying out all the possible bounding boxes to detect an object. This requires much computation time to generate region proposals making two-stage detectors inconvenient for fast applications. This work replaces the RPN in Mask-RCNN of detectron2 with an event camera for generating proposals for moving objects. Thus, saving time and being computationally less expensive. The proposed approach is faster than the two-stage detectors with comparable accuracy
翻译:人类视网膜包含两类光感受器:视杆细胞和视锥细胞。视杆细胞负责单色视觉,视锥细胞负责彩色视觉。视杆细胞数量远多于视锥细胞,这意味着人类视觉处理主要基于单色模式。事件相机通过报告像素强度变化来运作,其功能与视杆细胞类似。计算机视觉中的事件相机与彩色相机,恰似人类视觉中的视杆细胞与视锥细胞:人类能感知周边视野(左右两侧)中移动的物体,却无法对其进行分类(例如,当有人从您的左侧或右侧经过时,您能察觉其存在但无法识别其身份)。因此,视杆细胞在人类视觉中发挥着区域提议网络(RPN)的作用。基于此,事件相机可在深度学习领域充当区域提议网络。当前的深度学习两阶段目标检测器(如Mask R-CNN)包含用于特征提取的主干网络和RPN。现有RPN通过穷举所有可能边界框的暴力方法检测目标,这一过程需要大量计算时间生成区域提议,导致两阶段检测器难以适用于实时应用。本研究用事件相机替代Detectron2框架中Mask R-CNN的RPN,用于生成运动目标提议,从而节省时间并降低计算开销。实验表明,所提方法在保持与两阶段检测器相当精度的同时,实现了更快的处理速度。