Reconstructing hand-held objects from monocular RGB images is an appealing yet challenging task. In this task, contacts between hands and objects provide important cues for recovering the 3D geometry of the hand-held objects. Though recent works have employed implicit functions to achieve impressive progress, they ignore formulating contacts in their frameworks, which results in producing less realistic object meshes. In this work, we explore how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects. Our method consists of two components: explicit contact prediction and implicit shape reconstruction. In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image. The part-level and vertex-level graph-based transformers are cascaded and jointly learned in a coarse-to-fine manner for more accurate contact probabilities. In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space and leverage diffused contact probabilities to construct the implicit neural representation for the manipulated object. Benefiting from estimating the interaction patterns between the hand and the object, our method can reconstruct more realistic object meshes, especially for object parts that are in contact with hands. Extensive experiments on challenging benchmarks show that the proposed method outperforms the current state of the arts by a great margin.
翻译:从单目RGB图像重建手持物体是一项既具吸引力又充满挑战的任务。在该任务中,手与物体之间的接触为恢复手持物体的三维几何结构提供了重要线索。尽管近期研究通过隐式函数取得了显著进展,但其框架中未考虑接触建模,导致生成的物体网格不够真实。本研究探索如何以显式方式建模接触,以提升手持物体的隐式重建效果。本方法包含两个核心部分:显式接触预测与隐式形状重建。在第一部分,我们提出了一项新的子任务——直接从单张图像估计手-物体三维接触。通过以由粗到精的方式级联并联合学习基于部分级和顶点级的图变换器,提高了接触概率的准确性。在第二部分,我们引入一种新方法,将估计的接触状态从手网格表面扩散至邻近三维空间,并利用扩散后的接触概率构建操作物体的隐式神经表示。得益于对手与物体交互模式的估计,本方法能够重建更真实的物体网格,尤其是与手接触的物体部分。在多个具有挑战性的基准测试上的大量实验表明,所提方法大幅优于当前最先进技术。