3D hand tracking methods based on monocular RGB videos are easily affected by motion blur, while event camera, a sensor with high temporal resolution and dynamic range, is naturally suitable for this task with sparse output and low power consumption. However, obtaining 3D annotations of fast-moving hands is difficult for constructing event-based hand-tracking datasets. In this paper, we provided an event-based speed adaptive hand tracker (ESAHT) to solve the hand tracking problem based on event camera. We enabled a CNN model trained on a hand tracking dataset with slow motion, which enabled the model to leverage the knowledge of RGB-based hand tracking solutions, to work on fast hand tracking tasks. To realize our solution, we constructed the first 3D hand tracking dataset captured by an event camera in a real-world environment, figured out two data augment methods to narrow the domain gap between slow and fast motion data, developed a speed adaptive event stream segmentation method to handle hand movements in different moving speeds, and introduced a new event-to-frame representation method adaptive to event streams with different lengths. Experiments showed that our solution outperformed RGB-based as well as previous event-based solutions in fast hand tracking tasks, and our codes and dataset will be publicly available.
翻译:基于单目RGB视频的3D手部追踪方法易受运动模糊影响,而事件相机作为一种具有高时间分辨率和动态范围的传感器,凭借其稀疏输出和低功耗特性天然适用于该任务。然而,获取快速运动手部的3D标注数据对于构建基于事件的手部追踪数据集存在困难。本文提出了一种基于事件的速度自适应手部追踪器(ESAHT)来解决基于事件相机的手部追踪问题。通过在手部慢速运动数据集上训练的CNN模型,该模型能够借鉴基于RGB的手部追踪解决方案的知识,从而应用于快速手部追踪任务。为验证所提方案,我们构建了首个真实环境下由事件相机采集的3D手部追踪数据集,设计出两种数据增强方法以缩小慢速与快速运动数据之间的域差距,开发了一种速度自适应的事件流分割方法来处理不同运动速度的手部动作,并引入了一种适应于不同长度事件流的新型事件-帧表征方法。实验表明,我们的解决方案在快速手部追踪任务中优于基于RGB以及先前基于事件的方法。相关代码与数据集将公开发布。