This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of downstream tasks, such as grasping and motion planning for robotic hand-over and manipulation. The code is released here: https://github.com/SamsungLabs/HandNeRF
翻译:本文提出一种方法,用于学习手-物体交互先验,从而从单张RGB图像重建三维手-物体场景。由于单张图像的深度模糊性以及手与物体之间的遮挡,三维手-物体场景重建的推理及训练数据生成具有挑战性。我们将这一挑战转化为机遇,利用手部形状约束手与物体几何结构的可能相对配置。我们设计了一种通用隐函数HandNeRF,显式编码三维手部形状特征与二维物体特征的相关性,以预测手和物体的场景几何结构。通过在真实世界数据集上的实验表明,HandNeRF能够比同类方法更准确地重建新颖抓取配置下的手-物体场景。此外,我们证明基于HandNeRF的物体重建可确保下游任务(如机器人交接和操作中的抓取与运动规划)的执行更加精确。代码已发布在此处:https://github.com/SamsungLabs/HandNeRF