Reconstructing hand-held objects from a single RGB image is an important and challenging problem. Existing works utilizing Signed Distance Fields (SDF) reveal limitations in comprehensively capturing the complex hand-object interactions, since SDF is only reliable within the proximity of the target, and hence, infeasible to simultaneously encode local hand and object cues. To address this issue, we propose DDF-HO, a novel approach leveraging Directed Distance Field (DDF) as the shape representation. Unlike SDF, DDF maps a ray in 3D space, consisting of an origin and a direction, to corresponding DDF values, including a binary visibility signal determining whether the ray intersects the objects and a distance value measuring the distance from origin to target in the given direction. We randomly sample multiple rays and collect local to global geometric features for them by introducing a novel 2D ray-based feature aggregation scheme and a 3D intersection-aware hand pose embedding, combining 2D-3D features to model hand-object interactions. Extensive experiments on synthetic and real-world datasets demonstrate that DDF-HO consistently outperforms all baseline methods by a large margin, especially under Chamfer Distance, with about 80% leap forward. Codes are available at https://github.com/ZhangCYG/DDFHO.
翻译:从单张RGB图像重建手持物体是一项重要且具有挑战性的问题。现有基于有符号距离场(SDF)的方法在全面捕捉复杂的手-物交互方面存在局限性,因为SDF仅在目标物体附近区域可靠,因此无法同时编码局部手部和物体线索。为解决这一问题,我们提出DDF-HO,一种利用有向距离场(DDF)作为形状表示的新型方法。与SDF不同,DDF将三维空间中的一条射线(包含原点与方向)映射为对应的DDF值,包括一个二值可见性信号(用于判断射线是否与物体相交)和一个距离值(测量从原点到目标物体沿给定方向的距离)。我们随机采样多条射线,并引入一种新颖的基于二维射线的特征聚合方案和三维交叉感知手部姿态嵌入,为这些射线收集从局部到全局的几何特征,结合二维-三维特征来建模手-物交互。在合成数据集和真实数据集上的大量实验表明,DDF-HO始终以较大优势优于所有基线方法,尤其是在倒角距离指标下取得了约80%的性能提升。代码可在 https://github.com/ZhangCYG/DDFHO 获取。