We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous works focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. TOCH fields are a point-wise, object-centric representation, which encode the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for correcting erroneous sequences from off-the-shelf RGB/RGB-D hand-object reconstruction methods and transferring grasps across objects.
翻译:我们提出TOCH方法,通过数据先验对错误的三维手-物体交互序列进行优化。现有手部追踪器(尤其依赖极少摄像头的系统)常产生手-物体交叉或缺失接触等视觉上不真实的输出。尽管修正此类错误需要推理交互的时间维度特性,但现有研究多聚焦于静态抓取与接触。本方法核心是TOCH场——一种用于建模交互过程中手与物体对应关系的新型时空表征。TOCH场以点云形式呈现物体中心化表征,编码手部相对于物体的位置信息。基于这一新颖表征,我们通过时序去噪自编码器学习合理TOCH场的隐式流形。实验表明,TOCH超越局限于静态抓取与接触的现有最优三维手-物体交互模型。更重要的是,本方法即使在接触发生前后仍能生成平滑交互。利用单一训练好的TOCH模型,我们通过定量与定性实验验证了其在修正现成RGB/RGB-D手-物体重建方法所生成错误序列,以及跨物体抓取迁移任务中的有效性。