Modeling hand-object interactions is a fundamentally challenging task in 3D computer vision. Despite remarkable progress that has been achieved in this field, existing methods still fail to synthesize the hand-object interaction photo-realistically, suffering from degraded rendering quality caused by the heavy mutual occlusions between the hand and the object, and inaccurate hand-object pose estimation. To tackle these challenges, we present a novel free-viewpoint rendering framework, Neural Contact Radiance Field (NCRF), to reconstruct hand-object interactions from a sparse set of videos. In particular, the proposed NCRF framework consists of two key components: (a) A contact optimization field that predicts an accurate contact field from 3D query points for achieving desirable contact between the hand and the object. (b) A hand-object neural radiance field to learn an implicit hand-object representation in a static canonical space, in concert with the specifically designed hand-object motion field to produce observation-to-canonical correspondences. We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints, producing a high-quality hand-object reconstruction that achieves photo-realistic novel view synthesis. Extensive experiments on HO3D and DexYCB datasets show that our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.
翻译:手物交互建模是三维计算机视觉中的一项基础性挑战任务。尽管该领域已取得显著进展,现有方法仍难以逼真地合成手物交互场景,其渲染质量因手部与物体之间的严重相互遮挡以及不准确的手物姿态估计而退化。为应对这些挑战,我们提出了一种新颖的自由视角渲染框架——神经接触辐射场(NCRF),用于从稀疏视频集重建手物交互。具体而言,所提出的NCRF框架包含两个关键组件:(a) 接触优化场,通过三维查询点预测精确接触场,以实现手部与物体之间的理想接触;(b) 手物神经辐射场,在静态规范化空间中学习隐式手物表征,并与专门设计的手物运动场协同工作,建立观测空间到规范化空间的对应关系。我们联合学习这些关键组件,使其通过视觉与几何约束相互促进与正则化,从而生成高质量的手物重建结果,实现照片级真实感的新视角合成。在HO3D和DexYCB数据集上的大量实验表明,我们的方法在渲染质量和姿态估计精度方面均超越了当前最先进技术。