Event camera shows great potential in 3D hand pose estimation, especially addressing the challenges of fast motion and high dynamic range in a low-power way. However, due to the asynchronous differential imaging mechanism, it is challenging to design event representation to encode hand motion information especially when the hands are not moving (causing motion ambiguity), and it is infeasible to fully annotate the temporally dense event stream. In this paper, we propose EvHandPose with novel hand flow representations in Event-to-Pose module for accurate hand pose estimation and alleviating the motion ambiguity issue. To solve the problem under sparse annotation, we design contrast maximization and hand-edge constraints in Pose-to-IWE (Image with Warped Events) module and formulate EvHandPose in a weakly-supervision framework. We further build EvRealHands, the first large-scale real-world event-based hand pose dataset on several challenging scenes to bridge the real-synthetic domain gap. Experiments on EvRealHands demonstrate that EvHandPose outperforms previous event-based methods under all evaluation scenes, achieves accurate and stable hand pose estimation with high temporal resolution in fast motion and strong light scenes compared with RGB-based methods, generalizes well to outdoor scenes and another type of event camera, and shows the potential for the hand gesture recognition task.
翻译:事件相机在3D手部姿态估计中展现出巨大潜力,尤其能以低功耗方式解决快速运动和高动态范围场景下的挑战。然而,由于异步差分成像机制,设计能够编码手部运动信息的事件表征面临困难——特别是在手部静止时(导致运动模糊问题),且对时间密集的事件流进行完整标注并不可行。本文提出EvHandPose方法,在事件到姿态模块中引入新颖的手部流表征,以实现精准的手部姿态估计并缓解运动模糊问题。为应对稀疏标注下的挑战,我们在姿态到扭曲事件图像模块中设计了对比度最大化和手部边缘约束,并在弱监督框架下构建EvHandPose。此外,我们构建了首个面向多个挑战性场景的大规模真实世界事件相机手部数据集EvRealHands,以弥合真实与合成数据之间的领域差异。在EvRealHands上的实验表明,EvHandPose在所有评估场景中均优于现有基于事件相机的方法;与基于RGB的方法相比,可在快速运动与强光场景中实现高时间分辨率的精准稳定手部姿态估计;同时具备出色的跨域泛化能力(可迁移至户外场景及另一类型事件相机),并展示了在手势识别任务中的应用潜力。