Reconstructing hand-held objects from a single RGB image is an important and challenging problem. Existing works utilizing Signed Distance Fields (SDF) reveal limitations in comprehensively capturing the complex hand-object interactions, since SDF is only reliable within the proximity of the target, and hence, infeasible to simultaneously encode local hand and object cues. To address this issue, we propose DDF-HO, a novel approach leveraging Directed Distance Field (DDF) as the shape representation. Unlike SDF, DDF maps a ray in 3D space, consisting of an origin and a direction, to corresponding DDF values, including a binary visibility signal determining whether the ray intersects the objects and a distance value measuring the distance from origin to target in the given direction. We randomly sample multiple rays and collect local to global geometric features for them by introducing a novel 2D ray-based feature aggregation scheme and a 3D intersection-aware hand pose embedding, combining 2D-3D features to model hand-object interactions. Extensive experiments on synthetic and real-world datasets demonstrate that DDF-HO consistently outperforms all baseline methods by a large margin, especially under Chamfer Distance, with about $80\%$ leap forward. Codes are available at \url{https://github.com/ZhangCYG/DDFHO}.
翻译:从单张RGB图像重建手持物体是一个重要且富有挑战性的问题。现有基于有符号距离场(Signed Distance Field, SDF)的方法在全面捕捉复杂的手-物交互方面存在局限性,因为SDF仅在目标邻近区域可靠,因此无法同时编码局部手部与物体线索。为解决该问题,我们提出DDF-HO——一种利用有向距离场(Directed Distance Field, DDF)作为形状表示的新方法。与SDF不同,DDF将三维空间中的一条射线(由原点与方向组成)映射至对应的DDF值,包括一个判断射线是否与物体相交的二值可见性信号,以及一个测量沿给定方向从原点到目标距离的距离值。我们通过引入新颖的基于2D射线的特征聚合方案与3D交叉感知手部姿态嵌入,随机采样多条射线并为其收集从局部到全局的几何特征,从而融合2D-3D特征建模手-物交互。在合成与真实数据集上的大量实验表明,DDF-HO在所有基线方法中均取得显著优势,尤其在倒角距离(Chamfer Distance)指标上实现约80%的性能提升。代码已开源至 \url{https://github.com/ZhangCYG/DDFHO}。