We present ViTaM-D, a novel visual-tactile framework for dynamic hand-object interaction reconstruction, integrating distributed tactile sensing for more accurate contact modeling. While existing methods focus primarily on visual inputs, they struggle with capturing detailed contact interactions such as object deformation. Our approach leverages distributed tactile sensors to address this limitation by introducing DF-Field. This distributed force-aware contact representation models both kinetic and potential energy in hand-object interaction. ViTaM-D first reconstructs hand-object interactions using a visual-only network, VDT-Net, and then refines contact details through a force-aware optimization (FO) process, enhancing object deformation modeling. To benchmark our approach, we introduce the HOT dataset, which features 600 sequences of hand-object interactions, including deformable objects, built in a high-precision simulation environment. Extensive experiments on both the DexYCB and HOT datasets demonstrate significant improvements in accuracy over previous state-of-the-art methods such as gSDF and HOTrack. Our results highlight the superior performance of ViTaM-D in both rigid and deformable object reconstruction, as well as the effectiveness of DF-Field in refining hand poses. This work offers a comprehensive solution to dynamic hand-object interaction reconstruction by seamlessly integrating visual and tactile data. Codes, models, and datasets will be available.
翻译:本文提出ViTaM-D,一种用于动态手-物体交互重建的新型视觉-触觉融合框架,通过集成分布式触觉感知实现更精确的接触建模。现有方法主要依赖视觉输入,难以捕捉物体变形等精细接触交互细节。为突破这一局限,本研究利用分布式触觉传感器并引入DF-Field表征方法。这种分布式力感知接触表示同时建模手-物体交互中的动能与势能。ViTaM-D首先通过纯视觉网络VDT-Net重建手-物体交互,再经由力感知优化(FO)过程细化接触细节,从而提升物体变形建模精度。为建立评估基准,我们构建了HOT数据集,该数据集包含在高精度仿真环境中创建的600组手-物体交互序列,涵盖可变形物体。在DexYCB和HOT数据集上的大量实验表明,本方法在精度上显著优于gSDF、HOTrack等现有先进方法。实验结果凸显了ViTaM-D在刚性与可变形物体重建方面的卓越性能,以及DF-Field在优化手部姿态方面的有效性。本研究通过视觉与触觉数据的无缝融合,为动态手-物体交互重建提供了完整解决方案。代码、模型与数据集将公开提供。