We address in-the-wild hand-object reconstruction for a known object category in egocentric videos, focusing on temporal periods of stable grasps. We propose the task of Hand-Object Stable Grasp Reconstruction (HO-SGR), the joint reconstruction of frames during which the hand is stably holding the object. We thus can constrain the object motion relative to the hand, effectively regularising the reconstruction and improving performance. By analysing the 3D ARCTIC dataset, we identify temporal periods where the contact area between the hand and object vertices remain stable. We showcase that objects within stable grasps move within a single degree of freedom (1~DoF). We thus propose a method for jointly optimising all frames within a stable grasp by minimising the object's rotation to that within a latent 1 DoF. We then extend this knowledge to in-the-wild egocentric videos by labelling 2.4K clips of stable grasps from the EPIC-KITCHENS dataset. Our proposed EPIC-Grasps dataset includes 390 object instances of 9 categories, featuring stable grasps from videos of daily interactions in 141 environments. Our method achieves significantly better HO-SGR, both qualitatively and by computing the stable grasp area and 2D projection labels of mask overlaps.
翻译:我们针对外中心视角视频中已知物体类别下的无约束手-物体重建问题,关注其稳定抓取的时间段。我们提出手-物体稳定抓取重建(HO-SGR)任务,即对手部稳定握住物体时的所有帧进行联合重建。通过约束物体相对于手部的运动,可有效正则化重建过程并提升性能。通过分析3D ARCTIC数据集,我们识别出手部与物体顶点间接触区域保持稳定的时间段。实验表明,稳定抓取中的物体仅沿单个自由度(1~DoF)运动。为此,我们提出一种方法,通过将物体的旋转约束至潜在单个自由度内,实现对稳定抓取中所有帧的联合优化。随后,我们将该知识扩展至无约束外中心视角视频,从EPIC-KITCHENS数据集中标注了2.4K个稳定抓取片段。我们提出的EPIC-Grasps数据集包含9个类别的390个物体实例,这些稳定抓取来自141个场景的日常交互视频。通过计算稳定抓取区域和掩膜重叠的二维投影标签,我们的方法在定性和定量指标上均显著优于现有HO-SGR方法。