Existing hand detection algorithms work on images and the detection rate is restricted by the frame rate of the camera. In hand detection applications for moving robotic systems, conventional cameras cause motion blur, especially in darker lighting conditions. We can leverage the use of event-based cameras which possess a high dynamic range, high temporal resolution, and low power consumption. Recent work has shown that using a stereo setup of an event-based and a frame-based camera improves detection accuracy and the bandwidth-latency tradeoff. The main bottleneck in using event-based cameras in object detection and recognition tasks is a relatively low amount of training data. In this work, we propose a methodology and an exemplary synthetic event-based hand dataset from an egocentric, first-person view perspective. The data is synthesized from the existing RGB Egohands dataset with the v2e toolbox. Parameters of the v2e toolbox are varied to provide versions of the dataset with different lighting conditions and scales. Ground truth detections are generated with a fine-tuned YOLOv8 model which is applied to the RGB images in the Egohands dataset and interpolated on the high-temporal resolution events. We use the multi-modal dataset to perform hand detection with existing object detection algorithms which use a multi-modal setup of event and RGB cameras and demonstrate performance comparable to the state-of-the-art.
翻译:现有手部检测算法基于图像工作,其检测率受限于相机的帧率。在移动机器人系统的手部检测应用中,传统相机会导致运动模糊,尤其在低光照条件下更为显著。事件相机凭借其高动态范围、高时间分辨率和低功耗特性,可有效应对这一挑战。最新研究表明,采用事件相机与帧式相机组成的立体装置,能够提升检测精度并优化带宽-延迟权衡。目前将事件相机应用于目标检测与识别任务的主要瓶颈在于训练数据相对匮乏。本文提出了一种方法论及从第一人称自我中心视角生成的合成事件手部数据集。该数据通过v2e工具箱从现有RGB Egohands数据集合成,通过调整v2e工具箱参数获得不同光照条件与尺度的数据集版本。真实标注检测结果由微调后的YOLOv8模型生成,该模型应用于Egohands数据集中的RGB图像,并通过插值方法映射到高时间分辨率的事件数据上。我们利用该多模态数据集,结合基于事件与RGB相机多模态配置的现有目标检测算法进行手部检测,实验结果表明其性能达到当前最优水平。