Modeling hand-object interactions is a fundamentally challenging task in 3D computer vision. Despite remarkable progress that has been achieved in this field, existing methods still fail to synthesize the hand-object interaction photo-realistically, suffering from degraded rendering quality caused by the heavy mutual occlusions between the hand and the object, and inaccurate hand-object pose estimation. To tackle these challenges, we present a novel free-viewpoint rendering framework, Neural Contact Radiance Field (NCRF), to reconstruct hand-object interactions from a sparse set of videos. In particular, the proposed NCRF framework consists of two key components: (a) A contact optimization field that predicts an accurate contact field from 3D query points for achieving desirable contact between the hand and the object. (b) A hand-object neural radiance field to learn an implicit hand-object representation in a static canonical space, in concert with the specifically designed hand-object motion field to produce observation-to-canonical correspondences. We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints, producing a high-quality hand-object reconstruction that achieves photo-realistic novel view synthesis. Extensive experiments on HO3D and DexYCB datasets show that our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.
翻译:手物交互建模是三维计算机视觉中的一项根本性挑战性任务。尽管该领域已取得显著进展,现有方法仍无法逼真地合成手物交互场景,存在由手与物体间严重相互遮挡以及不准确的手物姿态估计导致的渲染质量劣化问题。为应对这些挑战,我们提出了一种新颖的自由视点渲染框架——神经接触辐射场(NCRF),用于从稀疏视频集合中重建手物交互。具体而言,所提出的NCRF框架包含两个关键组件:(a) 接触优化场,从三维查询点预测精确的接触场,以实现手与物体之间的理想接触;(b) 手物神经辐射场,在静态规范空间中学习隐式的手物表征,并与专门设计的手物运动场协同作用,建立观测空间到规范空间的对应关系。我们通过联合学习这些关键组件,使其在视觉与几何约束下相互促进与正则化,从而生成高质量的手物重建结果,实现逼真的新视角合成。在HO3D与DexYCB数据集上的大量实验表明,我们的方法在渲染质量与姿态估计精度方面均超越了当前最优水平。