Reconstructing hand-held objects from a single RGB image is an important and challenging problem. Existing works utilizing Signed Distance Fields (SDF) reveal limitations in comprehensively capturing the complex hand-object interactions, since SDF is only reliable within the proximity of the target, and hence, infeasible to simultaneously encode local hand and object cues. To address this issue, we propose DDF-HO, a novel approach leveraging Directed Distance Field (DDF) as the shape representation. Unlike SDF, DDF maps a ray in 3D space, consisting of an origin and a direction, to corresponding DDF values, including a binary visibility signal determining whether the ray intersects the objects and a distance value measuring the distance from origin to target in the given direction. We randomly sample multiple rays and collect local to global geometric features for them by introducing a novel 2D ray-based feature aggregation scheme and a 3D intersection-aware hand pose embedding, combining 2D-3D features to model hand-object interactions. Extensive experiments on synthetic and real-world datasets demonstrate that DDF-HO consistently outperforms all baseline methods by a large margin, especially under Chamfer Distance, with about 80% leap forward. Codes and trained models will be released soon.
翻译:从单张RGB图像重建手持物体是一个重要且极具挑战性的问题。现有利用符号距离场(SDF)的方法在全面捕捉复杂手物交互方面存在局限性,因为SDF仅在目标邻近区域可靠,无法同时编码局部手部与物体线索。为解决这一问题,我们提出DDF-HO,一种利用有向距离场(DDF)作为形状表示的新方法。与SDF不同,DDF将三维空间中的射线(由起点和方向组成)映射为对应的DDF值,包括一个二进制可见性信号(判断射线是否与物体相交)以及一个距离值(测量给定方向上从起点到目标的距离)。通过随机采样多条射线,我们引入了一种新颖的基于二维射线的特征聚合方案和三维交叉感知的手部姿态嵌入,结合二维-三维特征对手物交互进行建模,从而收集从局部到全局的几何特征。在合成与真实数据集上的大量实验表明,DDF-HO显著优于所有基线方法,尤其在倒角距离指标上实现了约80%的性能飞跃。代码与训练模型将很快发布。