Event camera shows great potential in 3D hand pose estimation, especially addressing the challenges of fast motion and high dynamic range in a low-power way. However, due to the asynchronous differential imaging mechanism, it is challenging to design event representation to encode hand motion information especially when the hands are not moving (causing motion ambiguity), and it is infeasible to fully annotate the temporally dense event stream. In this paper, we propose EvHandPose with novel hand flow representations in Event-to-Pose module for accurate hand pose estimation and alleviating the motion ambiguity issue. To solve the problem under sparse annotation, we design contrast maximization and hand-edge constraints in Pose-to-IWE (Image with Warped Events) module and formulate EvHandPose in a weakly-supervision framework. We further build EvRealHands, the first large-scale real-world event-based hand pose dataset on several challenging scenes to bridge the real-synthetic domain gap. Experiments on EvRealHands demonstrate that EvHandPose outperforms previous event-based methods under all evaluation scenes, achieves accurate and stable hand pose estimation with high temporal resolution in fast motion and strong light scenes compared with RGB-based methods, generalizes well to outdoor scenes and another type of event camera, and shows the potential for the hand gesture recognition task.
翻译:事件相机在3D手部姿态估计中展现出巨大潜力,尤其能以低功耗方式解决快速运动和高动态范围场景下的挑战。然而,由于异步差分成像机制,设计能够编码手部运动信息的事件表征具有挑战性——特别是当手部静止时(导致运动歧义),且无法对时间密集的事件流进行完整标注。本文提出EvHandPose方法,通过在事件到姿态模块(Event-to-Pose)中引入新型手部光流表征,实现精确手部姿态估计并缓解运动歧义问题。为解决稀疏标注下的难题,我们在姿态到事件扭曲图像模块(Pose-to-IWE)中设计对比度最大化与手部边缘约束,将EvHandPose构建为弱监督框架。此外,我们构建了首个大规模真实场景事件相机手部姿态数据集EvRealHands,覆盖多个挑战性场景,以弥合真实-合成域差异。在EvRealHands上的实验表明,EvHandPose在所有评估场景中均优于现有事件相机方法;与基于RGB的方法相比,其在快速运动与强光场景下能实现高时间分辨率的精确稳定手部姿态估计,并展现出优秀的室外场景泛化能力及其他事件相机适配性,同时在手势识别任务中显示出应用潜力。