In this work, we present ThermoHands, a new benchmark for thermal image-based egocentric 3D hand pose estimation, aimed at overcoming challenges like varying lighting conditions and obstructions (e.g., handwear). The benchmark includes a multi-view and multi-spectral dataset collected from 28 subjects performing hand-object and hand-virtual interactions under diverse scenarios, accurately annotated with 3D hand poses through an automated process. We introduce a new baseline method, TherFormer, utilizing dual transformer modules for effective egocentric 3D hand pose estimation in thermal imagery. Our experimental results highlight TherFormer's leading performance and affirm thermal imaging's effectiveness in enabling robust 3D hand pose estimation in adverse conditions.
翻译:本文提出ThermoHands——首个基于热成像图像的第一视角3D手部姿态估计基准数据集,旨在克服光照条件变化及手部遮挡(如手套)等挑战。该基准数据集包含从28名受试者在多场景下进行手-物体交互与手-虚拟交互时采集的多视角、多光谱数据,并通过自动化流程实现了精确的3D手部姿态标注。我们提出了一种新的基线方法TherFormer,该方法采用双Transformer模块实现热成像图像中有效的第一视角3D手部姿态估计。实验结果表明,TherFormer展现出领先性能,同时验证了热成像技术在恶劣条件下实现鲁棒3D手部姿态估计的有效性。