Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.
翻译:从单目RGB数据中重建交互手部是一项具有挑战性的任务,因为它涉及许多干扰因素,例如自遮挡与相互遮挡以及相似纹理。先前的工作仅利用单张RGB图像的信息,而未对其物理合理关系进行建模,导致重建结果不佳。在本工作中,我们致力于显式利用时空信息以实现更优的交互手部重建。一方面,我们利用时间上下文补充单帧提供的不充分信息,并设计了一种新颖的时间框架,通过时间约束实现交互手部运动的平滑性。另一方面,我们进一步提出一种穿透检测模块,以生成无物理碰撞且运动学合理的交互手部。通过大量实验验证了所提出框架的有效性,该方法在公开基准上达到了新的最先进性能。