Understanding hand-object interaction (HOI) is fundamental to computer vision, robotics, and AR/VR. However, conventional hand videos often lack essential physical information such as contact forces and motion signals, and are prone to frequent occlusions. To address the challenges, we present Glove2Hand, a framework that translates multi-modal sensing glove HOI videos into photorealistic bare hands, while faithfully preserving the underlying physical interaction dynamics. We introduce a novel 3D Gaussian hand model that ensures temporal rendering consistency. The rendered hand is seamlessly integrated into the scene using a diffusion-based hand restorer, which effectively handles complex hand-object interactions and non-rigid deformations. Leveraging Glove2Hand, we create HandSense, the first multi-modal HOI dataset featuring glove-to-hand videos with synchronized tactile and IMU signals. We demonstrate that HandSense significantly enhances downstream bare-hand applications, including video-based contact estimation and hand tracking under severe occlusion.
翻译:理解手-物体交互(HOI)是计算机视觉、机器人技术和增强现实/虚拟现实(AR/VR)领域的基础。然而,传统的手部视频通常缺乏接触力和运动信号等关键的物理信息,且容易受到频繁遮挡的影响。为应对这些挑战,我们提出了Glove2Hand框架,该框架将多模态传感手套捕获的手-物体交互视频转化为逼真的裸手效果,同时忠实保留底层物理交互动态。我们引入了一种新颖的3D高斯手部模型,确保时间渲染的一致性。通过基于扩散的手部修复器,渲染的手部被无缝集成到场景中,有效处理复杂的手-物体交互和非刚性形变。利用Glove2Hand,我们创建了HandSense——首个包含手套到手部视频并同步触觉和惯性测量单元(IMU)信号的多模态手-物体交互数据集。我们证明,HandSense显著增强了下游裸手应用,包括在严重遮挡条件下的基于视频的接触估计和手部追踪。