Understanding hand-object interaction (HOI) is fundamental to computer vision, robotics, and AR/VR. However, conventional hand videos often lack essential physical information such as contact forces and motion signals, and are prone to frequent occlusions. To address the challenges, we present Glove2Hand, a framework that translates multi-modal sensing glove HOI videos into photorealistic bare hands, while faithfully preserving the underlying physical interaction dynamics. We introduce a novel 3D Gaussian hand model that ensures temporal rendering consistency. The rendered hand is seamlessly integrated into the scene using a diffusion-based hand restorer, which effectively handles complex hand-object interactions and non-rigid deformations. Leveraging Glove2Hand, we create HandSense, the first multi-modal HOI dataset featuring glove-to-hand videos with synchronized tactile and IMU signals. We demonstrate that HandSense significantly enhances downstream bare-hand applications, including video-based contact estimation and hand tracking under severe occlusion.
翻译:理解手-物体交互是计算机视觉、机器人和增强现实/虚拟现实的基础。然而,传统的手部视频往往缺少接触力、运动信号等关键物理信息,且易受频繁遮挡影响。为应对这些挑战,我们提出Glove2Hand框架,该框架将多模态传感手套的手-物体交互视频转化为逼真的裸手,同时忠实保留底层物理交互动态。我们引入一种新颖的3D高斯手部模型,确保时序渲染一致性。通过基于扩散的手部修复器,渲染出的手部被无缝融入场景,有效处理复杂的手-物体交互和非刚性变形。借助Glove2Hand,我们创建了HandSense数据集——首个具备手套到手部视频及同步触觉和惯性测量单元信号的多模态手-物体交互数据集。我们证明,HandSense在严重遮挡条件下的视频接触估计和手部追踪等下游裸手任务中具有显著增强效果。